[package com.xmlmind.spellcheck.engine]
Here, there is no notion of graphical user interface. This section is meant for developers who want to modify the GUI components, or create entirely different applications. This can be the case, for example, if you want to do batch processing or modify the servlets of the Client/Server edition.
The central class is SpellChecker. It is a "facade" that provides most of the services.
The Suggestions interface gives access to the suggestions produced by SpellChecker.
Access to text is abstracted by a very simple interface, CharSequence. It is analogous to the java.lang.CharSequence interface of JRE 1.4 and in the future will be deprecated in favor of the standard.
A few simple implementations of this interface are available in package com.xmlmind.spellcheck.util.
There is another class, DictionaryManager, which needs not be manipulated by simple applications. This class has to be handled in the case you want to share dictionaries among several instances of SpellChecker (for example in a server-side application), or if you want to access additional dictionaries in non-standard locations.
They come in different categories:
Checking work per se: scanning a sequence of characters, checking individual words, getting suggestions for an erroneous word,
Manipulating languages and dictionaries, in particular the personal dictionary (learning words and suggestions).
Setting and getting miscellaneous control options.
The SpellChecker class cannot be shared between different threads since it contains the state of a spelling session.
Therefore in the context of a multi-threaded server for example, one instance of SpellChecker must be created for each spelling session. Since SpellChecker is not a heavyweight object (it does not contain the compiled dictionaries), there is no performance issue.
At the opposite DictionaryManager (used by SpellChecker) is multi-threadable and its purpose is precisely to share dictionaries. See the section about dictionaries for more details.
The first step is to create an instance of SpellChecker.
This is performed directly. The simplest constructor has one argument, which is a "dictpath", the name of a directory where dictionaries are installed.
For a discussion of the dictionary storage conventions, see the "Dictionary Management" section below.
String dictPath = ... // application-dependent SpellChecker checker = new SpellChecker( dictPath );
Then it is possible to set a number of options: see the 'Options' section for more details.
It is important to set the current language: setSelectedLanguage(languageCode) performs this task. Available languages can be obtained by listLanguages().
Something useful is the load and save path for the personal dictionaries (There is a distinct personal dictionary for each language). This is probably application and system-dependent.
String path = homeDir + File.separatorChar
+ "myapp_spell" + File.separatorChar + "%L%" ;
checker.setPersonalDictionaryPath( path );Actually the path set is a pattern that has to contain a marker for the language name. The marker for the language name is "%L%", as it can be seen in the example above.
Then we come to the spell-checking main loop. The model we use is very simple and attempts to make as few hypotheses as possible about the client application.
The actual implementation of your text is abstracted by an interface called CharSequence (the same as in the JRE 1.4).
The Spell Checker accepts a piece of text so a CharSequence) through SpellChecker.setInput(...), then checkNext() is invoked.
CharSequence myInput = ... // get from your application checker.setInput( myInput ); int err = checker.checkNext();
If checkNext() returns ERR_NONE, the piece of text set as input is correct, and the application has to proceed on the next piece of text or to finish.
Processing the errors returned by checkNext():
No error has been detected. The application should proceed to the next piece of text, or to finish.
A word not contained in dictionaries, and not compound from existing words. Typically, you will invoke getSuggestions() to obtain pertinent (we hope so) suggestions for correcting the word.
The word is known, but is improperly capitalized. For example it is a proper name starting with a lowercase letter, or an acronym expected to be in all caps (for example "Xml" instead of "XML"). It can also be a plain word after an end-of-sentence punctuation mark. It is also possible here to invoke getSuggestions() which should return the properly capitalized word first among other suggestions.
This error can be inhibited by setting the CheckCase option to false.
A dubious sequence of punctuation marks was found: either a whitespace before marks such as dot, comma, colon, semicolon, question or exclamation mark, or two consecutive marks (except dots) such as ".,". Here also, getSuggestions() will propose replacements.
This error can be inhibited by setting the CheckPunctuation option to false.
Two identical consecutive words. (Note: In some languages it can sometimes be correct, like in English "had had" or in French "nous nous", but this case is not yet supported). getWord() and getPosition() return the second word, but getSuggestions() does not return proper results. The action is basically to ignore the error or to delete the second word.
This error can be inhibited by setting the CheckDuplicate option to false.
If the personal dictionary has been enriched with replacements to perform automatically (using learnAutoReplacement()) - this corresponds with a command like "Replace Always"- the checkNext() method signals it has encountered such a replacement. The action to take here is to invoke getReplacement( ) passing the word obtained by getWord(), the to proceed in the check loop. For example:
String word = checker.getWord();
myTextSource.replace( checker.getPosition(), word.length(),
checker.getReplacement(word) );This mechanism can be inhibited by setting the AutoReplace option to false.
Notes about setInput() and the checking loop:
The character sequence set with setInput() is assumed to stay unmodified by the application until checkNext() reaches its end. Depending on the implementation, this can often be unrealistic if a replacement is performed. Therefore:
the simplest way is to always call the setInput() method before checkNext(), with a fragment reflecting the updated state of the text source.
Alternately, the input text can be left untouched by the modifications, but the application has to translate the positions returned by getPosition(), since they are relative to the original text fragment
A skeleton of the search loop:
Note: The text source (here mySource) typically implements the TextSource interface defined in package com.xmlmind.spellcheck.ui.
void doSearch() {
for(;;)
{
... // prepare
// acquire next fragment from application:
input = mySource.getText(checker.getCharChecker());
if (input == null) {
... // no more input
return;
}
checker.setInput(input);
int err = checker.checkNext();
if (err == SpellChecker.ERR_NONE) {
// end reached: update position in source
...
continue;
}
String failingWord = checker.getWord();
int replacePos = checker.getPosition();
int replaceSize = failingWord_.length();
// application dependent:
mySource.highlight(replacePos, replaceSize);
switch(err) {
case SpellChecker.ERR_DUPLICATE:
showStatus("duplicate word: " + failingWord);
break;
case SpellChecker.ERR_REPLACE:
mySource.replace(replacePos, replaceSize,
checker.getReplacement(failingWord));
continue;
case SpellChecker.ERR_WRONG_CAP:
showSuggestions("word should be capitalized");
break;
case SpellChecker.ERR_PUNCTUATION:
showSuggestions("punctuation problem");
break;
case SpellChecker.ERR_UNKNOWN_WORD:
showSuggestions("unrecognized word");
break;
}
// get and process user commands:
break;
}
}Displaying suggestions:
The interface Suggestions returned by SpellChecker.getSuggestions() provides methods to retrieve suggestions individually: getSuggestion(int index) or as an array: String[] Suggestions.toArray().
Suggestions are ordered by decreasing pertinence and their number is given by Suggestions.getCount().
The maximum number of returned suggestions can be set by setSuggestionLimit.
Example:
String[] suggestions = checker.getSuggestions().toArray();
JList displayList = new JList(suggestions);
if (suggestions.length > 0)
displayList.setSelectedIndex(0);Smarter suggestions:
There is a mechanism to teach the SpellChecker to make better suggestions. When a word entered by a user to correct an erroneous word is not among the first suggestions found, it is possible to invoke learnSuggestion() with the wrong word and its correction as arguments: the next time this word is encountered, the learned suggestion will be put atop the suggestion list.
void doReplace() {
String correction = ...; // get correction from user
// if not among the 3 first , learn it:
if (!suggestions.contain( correction, 3 ))
checker.learnSuggestion( failingWord, correction,
SpellChecker.TEMPORARY_DICT );
...
}In this example, the learned suggestion is put into the temporary dictionary, therefore lost at the end of the session. It is also possible to put it in the persistent personal dictionary (SpellChecker.PERSONAL_DICT).
Options are manipulated in a get/set way (to be compatible with the Java Bean requirements).
For example, the IgnoreCase option is handled with boolean getIgnoreCase() and void setIgnoreCase(boolean).
SpellChecker has also two methods (loadOptions and saveOptions) to globally set/retrieve options from/into a java.util.Properties object.
Table 1. Options
| Option | Description | Type | Default value |
|---|---|---|---|
| IgnoreCase | if set, ignore capitalization errors | boolean | false |
| IgnoreMixedCase | If set, do not check words containing case mixing (e.g. "SpellChecker") | boolean | false |
| IgnoreDigits | If set, do not check words containing digits (e.g. "b2b") | boolean | true |
| IgnoreURL | If set, ignore words looking like URL or file names (e.g. "www.xxx.com" or "c:\boot.ini") | boolean | true |
| IgnoreDuplicates | If set, do not signal two successive identical words as an error. | boolean | false |
| CheckPunctuation | If set, punctuation checking is enabled: misplaced white space and wrong sequences, like a dot following a comma, are detected. | boolean | false |
| AllowCompound | If set, all words formed by concatenating two legal words with an hyphen are accepted. If the language allows it, two words concatenated without hyphen are also accepted. | boolean | true |
| AllowPrefixes | If set, a word formed by concatenating a registered prefix and a legal word is accepted. For example if "mini-" is a registered prefix, accepts "mini-computer". | boolean | true |
| AllowFileExt | If set, accepts any word ending with registered file extensions (e.g. "myfile.txt", "index.html" etc.) | boolean | true |
| AutoReplace | Enables the "Replace Always" feature. If set, the checkNext method of SpellChecker can return ERR_REPLACE, then getReplacement() can be used to retrieve the replacement value. | boolean | true |
| SuggestionForce | Intensity of suggestion search: ranges from 0 to FORCE_MAX. | int | FORCE_DEFAULT |
| SuggestionLimit | Maximum number of suggestions returned (does not influence the duration of a suggestion search). | int | 15 |
There are numerous methods for managing dictionaries.
To know more about dictionary structure, read the Dictionary Builder documentation.
The most likely used methods are the following:
setPersonalDictionaryPath: defines a pattern for file storage location for personal dictionaries.
listLanguages: returns a list of items described detected languages and dictionaries.
setSelectedLanguage: selects a language, loads default dictionary if necessary.
selectDictionary: loads a dictionary (if necessary) and selects implicitly the dictionary's language. This method works like setSelectedLanguage, except that other dictionaries already loaded in the same language are removed.
getSelectedLanguage getSelectedLanguageInfo: information about the currently selected language.
savePersonalDictionaries: forces a save of all personal dictionaries (for example on exit).
getDictionaryManager setDictionaryManager: for more advanced control.
setDictionaryPath: defines a non-standard directory where dictionary archives (.dar) can be found.
Other methods:
clearLanguageDictionaries: resets a language.
listEditableDictionaries: returns a list of editable dictionaries for the current language.
manageEditableDictionary: to select, add, load, or remove an editable dictionary.
getEditableWords: returns an array of word descriptors from the current editable dictionary.
changeWord: to edit the contents of editable dictionaries.