Microsoft KB Archive/74341

= Creating Pen Windows Dictionaries =

Article ID: 74341

Article Last Modified on 1/8/2003

-

APPLIES TO


 * Microsoft Windows for Pen Computing 1.0

-



This article was previously published under Q74341



SUMMARY
This article is intended to assist in the development of a custom dictionary used with Microsoft Pen Windows. Included in this article is information regarding the following:


 * The purpose of a dictionary
 * The flow of control for dictionary processing
 * An in-depth explanation of the dictionary messages
 * Some intended future changes

Please review the following six sections in the &quot;Microsoft Pen Windows Software Development Kit Guide to Programming&quot; documentation before proceeding:


 * 1) The Dictionary Functions section of the Overview chapter.
 * 2) The DictionaryProc function in the Pen API chapter.
 * 3) The DictionarySearch function in the Pen API chapter.
 * 4) The rc.rglpdf, rc.clTryDicationary and rc.lRcOptions values in the Pen Structures chapter.
 * 5) The SYV_Values (SYV) and the Symbol Graph (SYG) in the Pen Structures chapter.
 * 6) The RCO_ constants in the Pen Messages chapter.

Please note that the information in this article is intended to enhance the understanding of dictionaries.



PURPOSE OF A DICTIONARY
A dictionary, in the Pen Windows context, is a dynamic-link library (DLL) with one exported function. This function, DictionaryProc, is called with different subfunction identifiers to perform some service on the user's input. Typically, the DLL is intended to be used by an application during the recognition process to increase the chances of correctly interpreting what the user entered. To do this, an application can inform the system to call the dictionary before returning the WM_RCRESULT message, or it can call the dictionary itself to perform a post recognition search. A dictionary is completely optional and need not be used to produce tangible results from recognition.

Dictionaries can perform a number of different services. The most common interpretation of how to use a dictionary is to look for an exact match in a list of words. A dictionary could also perform spell checking on a string, format checking of the string, and macro expansion on the string. Dictionaries are designed to be sufficiently flexible to satisfy a number of different services.

Of all the services a dictionary can perform, the most common interpretation of the purpose of a dictionary is for word matching. With the Windows SDK, Microsoft has provided two dictionary DLLs: MAINDICT.DLL and USERDICT.DLL. MAINDICT.DLL is a general-purpose language dictionary providing, among other things, a facility to look up words of many European languages. USERDICT.DLL is a dictionary DLL and can be used to look up words using user-supplied word lists.

THE FLOW OF CONTROL
The control over when a dictionary is called can either be left with the system or exercised by the application. Both situations are important and need to be addressed.

The simpler situation is to let the system control when the dictionary will be called. When an application sets the RC structure to be used during recognition, it does not need to change any of the default values to have the system call the default system dictionary. The variables in the RC structure that are of interest to the dictionary are: rc.lRcOptions, rc.rglpdf, rc.clTryDictionary, and possibly rc.dwDictParam. The following paragraphs explain these variables in more detail so an application can fine tune how the system will respond when calling the dictionary.

rc.lRcOptions
This variable is used by both the recognizer and the dictionary. The RCO constants that pertain to the dictionary are RCO_NOSPACEBREAK and RCO_SUGGEST. The following two paragraphs explain how the application should use these constants.

The RCO_NOSPACEBREAK constant specifies that when the system calls the dictionary it should not do any preprocessing of the information in regards to space breaks. If the user entered &quot;hi there&quot; and the RCO_NOSPACEBREAK flag is set, the dictionary should receive the string &quot;hi there&quot;. If the flag is not set, the dictionary should receive two separate strings. The system will first ask the dictionary to identify the string &quot;hi&quot; and then it will ask it to process &quot;there&quot;. RCO_NOSPACEBREAK should be used only in cases where white space is a significant part of the information. Usage of this flag in language strings processing could significantly slow down the look-up.

If the constant RCO_SUGGEST is set, the system will call the dictionary with the DIRQ_SUGGEST message, but only under special circumstances. The DIRQ_SUGGEST message will be sent to the dictionary only if the RCO_SUGGEST option is set in the RC structure and the dictionary failed to return success when processing the DIRQ_STRING messages. The DIRQ_SUGGEST message is designed to be a message that is sent to the dictionary after the DIRQ_STRING messages have been processed. When the dictionary processes the DIRQ_STRING messages, it basically looks for the match in the word list that fits the enumeration that it received. The DIRQ_SUGGEST message is primarily designed for spell checking or macro expansion of the enumeration. For example, if the user enters &quot;SDK&quot; and the word &quot;SDK&quot; is not in the dictionary, then, if the RCO_SUGGEST flag is set, the dictionary will be called with the best enumeration of the string and it will be able to return its &quot;best guess.&quot; In this case, the dictionary might return &quot;Software Development Kit&quot; instead of &quot;SDK&quot;. If the RCO_SUGGEST flag is not set in the RC structure, this message will never be sent to the dictionary.

rc.rglpdf
Within the Pen Windows header file, PENWIN.H, this variable is an array of far addresses to dictionary functions. Currently, the system will support any number of dictionaries from zero to MAXDICTIONARIES (16). The number and order of these callback functions is important.

If there are no dictionaries in the list, the system will use the &quot;NULL&quot; dictionary to extract the symbols from the symbol graph with the highest confidence level and it will return this with the WM_RCRESULT message. If there is one dictionary in the list, the system will call the dictionary with the DIRQ_STRING message for each enumeration of the symbol graph. For instance, if the symbol graph returned from the recognizer is &quot;{1|l}ow&quot;, the dictionary will be called with each enumeration of the symbol graph -- &quot;1ow&quot; followed by &quot;low&quot;. With each call, the dictionary has the option of either accepting or rejecting the input. If the input is not accepted, the system will continue to send enumerations to the dictionary until there are no more enumerations left. After all the enumerations have been sent, the system will call the dictionary with a DIRQ_STRING message with the lpIn parameter set to NULL to inform the dictionary that there are no more enumerations left. If the dictionary accepts the input, the system immediately sends the ending message DIRQ_STRING with lpIn set to NULL. If there is only one dictionary in the search list, this is when the system would send the DIRQ_SUGGEST message.

If there are multiple dictionaries in the list, the system prioritizes which dictionary is called when by the dictionary's position in the list and the confidence of the enumeration. The system starts by getting the first enumeration of the symbol graph. For every enumeration, it calls the dictionaries starting with the one located at position zero with the DIRQ_STRING message. If the first dictionary in the list rejects the input, the second dictionary will be called with the same enumeration. This process continues until there are no more enumerations or a dictionary returns success. If at any time, one of the dictionaries reports a &quot;match&quot;, the system will send a DIRQ_STRING message to every dictionary with lpIn set to NULL to indicate that the search is over. If no match if found, the system may then call the dictionaries with the DIRQ_SUGGEST message in the same order. As stated earlier, the DIRQ_SUGGEST message will be called only if the RCO_SUGGEST flag is set.

rc.clTryDictionary
This variable should be set with the &quot;cut off&quot; threshold value to be used by the system when determining what symbol graph to send to the dictionary. Every character returned by the recognizer will have a confidence level associated with it. If that confidence level is less than the value of clTryDictionary, that character will not be included in any enumeration sent to the dictionary. For example, if the user enters &quot;{1|l}ow&quot; and the recognizer reports that the confidence level for &quot;1&quot; is 60 and the confidence for &quot;l&quot; is 40 when the &quot;cut off&quot; threshold, clTryDictionary, is 50, the dictionary will only be called with the symbol values &quot;1ow&quot;. (NOTE: This process of enumeration using clTryDictionary is under refinement; it may change in some future prerelease.)

rc.dwDictParam
This variable is reserved for dictionaries that require more specific information when called by the system. The information placed in this variable is passed on to the dictionary function as the ID parameter. Applications can use this variable to store information that might be very specific to the RC structure that the application is currently using. When using this variable, it is important that the application controls what dictionary will be called because the dwDictParam is sent to every dictionary in the list. For this reason, if this variable is used, the application should be aware of all the dictionaries in the list so that only the dictionaries that can support this ID value will receive the message.

There are ways to invoke the dictionary other than placing the dictionary function callback address in the rc.rglpdf list. Every WM_RCRESULT message contains the symbol graph that represents that user's input. To simulate the operations of the system, an application can call the DictionarySearch function and it will mimic the calls that the system makes before the application receives the WM_RCRESULT message. This can be useful for calling a dictionary with information that needs to be &quot;recognized&quot; in context. For example, if the user writes &quot;at&quot; on top of the &quot;is&quot; in the word &quot;this&quot;, it might be useful to have the application defer the dictionary search until after it receives the WM_RCRESULT message. The application would then call the DictionarySearch function with the symbol graph &quot;that&quot; instead of having the system call the dictionary with &quot;at&quot;.

MESSAGES
When designing a dictionary, it is necessary to be aware of the messages that are required to support system services and the messages that support desired enhanced functionality. The most important message is the DIRQ_STRING message. If the dictionary is ever called by the system, it will be called with the DIRQ_STRING message. All other messages are optional.

The messages, defined by Microsoft, can be combined into logical groups. There are basically three different types of messages: system messages, dynamic manipulation messages, and other messages. The system messages include: DIRQ_STRING, DIRQ_RCCHANGE, and DIRQ_SUGGEST. The dynamic manipulation messages include: DIRQ_ADD, DIRQ_CLOSE, DIRQ_DELETE, DIRQ_FLUSH, DIRQ_OPEN, DIRQ_SAVE, and DIRQ_SETWORDLISTS. Other messages include: DIRQ_CONFIGURE, DIRQ_DESCRIPTION, DIRQ_QUERY, and DIRQ_USER. The dictionary may support any number of these messages. The following few paragraphs describe the intent of the messages and when they might be used. Messages are listed alphabetically.

DIRQ_ADD
Used to build dictionaries dynamically. If a custom dictionary has been built that allows for dynamic expansion, it should process this message. As the name implies, the dictionary should add the word to the word list used by the dictionary. Keep in mind that the dictionary is responsible for managing the word list so it either has the option of saving the word list during the processing of this message or during the DIRQ_CLOSE message that may come at some, much later, time.

DIRQ_CLOSE
Used to close a dynamic word list. This message is intended to indicate that a particular word list will not be used any more; the dictionary should save the word list if any changes have been made to it by the DIRQ_ADD or DIRQ_DELETE message and remove the word list from the current list of options that may be used by the dictionary.

DIRQ_CONFIGURE
If a dictionary chooses to process this message, it should display a dialog box that shows the configuration of the dictionary for the user to view and/or manipulate. This dialog box may contain information pertaining to the language that is currently selected, or perhaps, information about the word lists that are currently loaded. How this function is processed is defined by each dictionary.

DIRQ_DELETE
Used to delete words from a dynamic dictionary. When processing this message, the dictionary should delete individual words in a particular word list used by the dictionary.

DIRQ_DESCRIPTION
When this message is processed, the dictionary returns the name of the DLL that contains the DictionaryProc function.

DIRQ_FLUSH
The dictionary will reinitialize its word list. This allows for dynamic restructuring of the dictionary's resources. Flushing essentially empties an entire word list from the dictionary but it does not remove the word list from the list of word lists that the dictionary supports.

DIRQ_OPEN
Used to open multiple word lists, which can be used singly or in combinations for look-up. When the dictionary processes this message, it should load a word list into memory so it can be accessed at some future time. Ultimately, dictionaries should be able to open more than one word list at a time. A good example of how this might work can be seen with a typical form application. In this example, an application loads a dialog box that pertains to an auto insurance form. It might call the dictionary to install five of six word lists pertaining to particular aspects of the form. For example, one list may contain all possible automobile manufacturers, another list might contain cars made by FORD, another might contain cars made by GM, and so on. After the dictionary loads the word lists, it should keep a unique identifier, a handle, for each list so the lists can be changed by the application by a call to the dictionary with DIRQ_SETWORDLISTS.

DIRQ_QUERY
Applications may call a dictionary with this message to determine which subfunctions the dictionary will support. The caller will call the dictionary to inquire if a particular message is supported.

DIRQ_RCCHANGE
If a dictionary depends on information in the global RC structure, it might want to monitor this message. If any changes occur in the global RC structure, the system will send this message to every dictionary that is currently selected in the global RC structure. For any dictionary that is not listed in the global RC structure, the application can monitor the WM_GLOBALRCCHANGE message and update any custom RC structures that it has saved for particular situations. To do this, the application would send the DIRQ_RCCHANGE message to the dictionary itself to inform it of the change. Note that the Global RC structure could change at any time during the operation of Windows because it is affected by user input to the Control Panel.

DIRQ_SAVE
This message has been removed from the messaging scheme used by dictionaries as defined by the system. The functionality of the DIRQ_SAVE message is now included in the DIRQ_CLOSE message.

DIRQ_SETWORDLISTS
Used to dynamically manipulate the word lists for a dictionary. This message is used when toggling between word lists in a particular dictionary. If a dictionary loads more then one word list, as described in the DIRQ_OPEN message, the application will want to call the dictionary with this message immediately before calling the Recognize function. This will give the dictionary the opportunity to set the default search list when the system calls the dictionary. For example, when the application receives the WM_LBUTTONDOWN message, it will call the dictionary with the DIRQ_SETWORDLISTS message to set the word list to match the edit field that the user is writing in. As an example, if the dictionary has two word lists, a list of American cars and a list of foreign cars, and the edit field asks the user to enter an the name of an American car, the application would call the dictionary with this message to inform it that it should use the American car word list for the next series of calls that it will be sent.

DIRQ_STRING
This message was briefly discussed in the &quot;Flow of Control&quot; section above. When the system calls an active dictionary, it will send the dictionary this message. The dictionary should look for the best or exact &quot;match&quot; in its word list. A DIRQ_STRING is thought of as one of many in a sequence of DIRQ_STRING calls. If one DIRQ_STRING message is sent to the dictionary with lpIn set to something other than NULL, there will always be at least one more DIRQ_STRING message; either another DIRQ_STRING message with an enumeration of the symbol graph in the lpIn variable, or the DIRQ_STRING message with lpIn set to NULL. The DIRQ_STRING message with the lpIn variable set to NULL has special significance. This call marks that either a dictionary has returned a success code to a previous DIRQ_STRING message or that there are no more enumerations of the symbol graph left.

DIRQ_SUGGEST
This message was discussed above. If a dictionary is called with this message, it should provide its best guess as to what the symbol string is. The dictionary will be sent a string that represents that best symbol string for it to process. For the system to call the dictionary with this message, the rc.lRcOptions variable must be RCO_SUGGEST.

DIRQ_USER
The message is a place holder for dictionary specific subfunctions. The only time that the dictionary will see this message is when it is called explicitly by an application that is aware of the subfunction. For instance, a dictionary might have the ability to determine whether the first character in the symbol graph is uppercase. This functionality may be supported when processing the DIRQ_USER+1 message within the dictionary. Therefore, the application might call the dictionary after receiving the WM_RCRESULT message with DIRQ_USER+1 to verify that the first letter is uppercase.

FUTURE IMPLEMENTATION
DIRQ_SYMBOLGRAPH will be added the DIRQ list. In a future prerelease, this will be the first message sent to a dictionary by the system. It will give the dictionary the ability to parse the symbol graph and return an answer before the system starts enumerating the symbol graph.

Also note that the meaning of the clTryDictionary value is being redefined. It may change in a future release.

Additional query words: 1.00

Keywords: KB74341

-

[mailto:TECHNET@MICROSOFT.COM Send feedback to Microsoft]

© Microsoft Corporation. All rights reserved.