|
Introduction:
These pages discuss the algorithms used by my Tangleword/Boggle solver. There are two different methods of generating the solution for a given board which are discussed here.
The first, a trie-based method, is by far the faster of the two, is simpler, and is well known. A trie is basically a word-tree, a data structure which makes solving extremely quick. However, the trie itself is time-consuming to build and uses a great deal of storage. Therefore, for the trie-based method to be feasible for a simple CGI program (which is executed starting with no information in memory), the trie itself must be pre-generated by a different program and stored (in some complex format) as a data file that the Tangleworld solver uses. In the course of solving, the solver will jump randomly all over the data file.
The second, a heap-based method, is a bit slower but finds the words on the board in order with minimal memory use and no prior computation required. Instead of a complex data file specifying a trie, which is accessed randomly, all that's needed is an alphabetical list of all valid words. The following sections go into more detail.
Trie-based algorithm:
- Build a trie out of all words in the dictionary (very time consuming), or read a pre-built trie from a disk file into memory (still very slow), or work directly out of a disk file which requires jumping randomly all over the file (requires no time for this step but significantly slows solving).
- Traverse the trie and the board at the same time, saving all words found (optimally efficient, and very fast). Duplicate words will be found, and these must be eliminated.
- Sort the words found (which will be found in a useless order) alphabetically and/or by length, and finally print them out.
Heap-based algorithm:
- Start solving immediately, traversing the entire alphabetic word-list file sequentially (moderately fast). Print out words as they are found, as no duplicates are generated and the words are found in alphabetic order.
- Sort the words by length if desired.
CGI considerations:
One of the limitations of CGI (Common Gateway Interface: how the Web server connects to a text-based program, so that the input to the program comes, for example, from the data sent when you press a "Submit" button on a web page, and the output of the program is directed back to your browser) is that each time the Web server invokes your program/script through CGI, the program starts executing with no information in memory besides the parameters with which it is to be run, in this case, the letters on the board, minimum word length, and board size.
On the other hand, with a non-CGI program we would be able to take the time to build or copy the trie into memory, and then solve many boards very fast within the same instance of the program. We could create a "solver-daemon" which runs in the background and maintains the trie in memory, and a CGI program to act as an interface to it. But as most ISPs aren't very happy about user-created daemons running in the background and holding onto valuable system resources, this isn't the ideal solution for most cases.
I came up with the heap-based method described in detail here as a best-of-both-worlds method. Almost no memory is used, a simple data file (just an alphabetical list of the words) is traversed linearly, the results are generated in alphabetical order, and solving is just about as fast and in some circumstances may be faster than using a trie.
Contents:
Heap-based Boggle/Tangleword solver algorithm: detailed explanation of the algorithm; this is the interesting part.
See also:
Boggle/Tangleword Solver (the program itself)
|