5. Building from ispell dictionaries

In the realm of public-domain spell-checking, ispell and its affix files are a popular way of defining word lists. See http://fmg-www.cs.ucla.edu/geoff/ispell.html for an introduction to ispell.

If you want to build a dictionary for a language that is not in the distribution, you will probably find (if it exists) a word list in ispell format. Check the following URL: http://fmg-www.cs.ucla.edu/geoff/ispell-dictionaries.html

Notice that most of these dictionaries are covered by the GNU Public License. Therefore you have to check whether it is suitable for your needs: it is always suitable for personal use, but redistributing compiled dictionaries could raise a legal issue with some authors.

You need of course an operational ispell installed on your machine (most probably a Unix box). It is strongly recommended to have a 8-bit clean version, with a compile-time option "MASK BITS" set to 64. This sounds esoteric, but it should be the case with most recent Linux distributions.

The basic job is to expand the affixed word list into a plain word list. For this you have to first compile into a ispell ``hashfile'':

buildhash mydict.txt mydict.aff mydict.hash

Here the mydict.txt file is the affixed word list, mydict.aff the affix file.

Then, the word list is expanded this way:

ispell -e -d ./mydict.hash < mydict.txt > mydict.wl

Then the expanded word list (here mydict.wl) can be used with the builder.