08.02.07
How many Boggle boards are there?
I’ve taken a several months break from my Boggle series, mostly because I think everyone got tired of it. I’m going to come back to it, but hopefully at a slower pace this time around.
Last time, we completed a board solver that could score 10,000 boards per second. So what to do with all that power? One option was to compute statistics on random boards. Another is to hunt for the holy grail of Boggle: the highest possible scoring Boggle board. The next few posts will be devoted to the search for this board.
Before undertaking any search, you need to get a feel for your search space. In our case, that’s the set of all 4×4 Boggle boards. How many are there? We can do a few back-of-the-envelope calculations.
To create a board, you roll 16 dice. Each has six possible letters on it, which gives 6^16 possibilities. These dice land in some permutation on the board, which gives another factor of 16!. Finally, a 4×4 board has eight-fold symmetry, which takes us down to 6^16 * 16! / 8 = 7.3e24 ≈ 2 ^ 83.
That’s one upper bound. But it assumed that all 6*16 = 96 symbols on the dice were unique. Obviously they’re not. After a roll, each of the 16 squares will have one of 26 letters on it. Divide by the symmetries, and you get 26 ^ 16 / 8 = 5e21 ≈ 2^72. Much better!
I haven’t been able to come up with any better upper bounds than these. The main flaw in the second approximation is that not all boards can be rolled with the sixteen dice that Hasbro provides. A board of all z’s or qu’s simply can’t occur. If we knew the probability that any sixteen character sequence could be rolled, this would give a true approximation of the number of distinct boards.
The easiest way to do this is with a computer program. It picks a random sequence of sixteen characters, checks whether this board can be rolled, and repeats several thousand times. I believe that checking whether a given board can be rolled is NP-Complete, but in this case the greedy approximation works quite well. I wrote a program (leave a comment if you want to see it) to do this, and processed one million 16-character sequences. Only 84,492 could be represented with the usual dice, or 8.4%. This gives a total of
(26 ^ 16 / 8) * (84,492 / 1,000,000) = 4.6e20 ≈ 2^69.
If you like confidence intervals, my sample size of one million boards gives a 95% confidence interval of [4.545e24, 4.666e24] for the total number of boards. Pretty good.
So, assuming we could enumerate all these boards quickly, how long would it take for our faster solver to find the best board? At 10k boards/sec, we’re looking at 4.5e16 seconds = 1.9 billion years! Clearly we need to do better.
Nathan Buch said,
February 8, 2009 at 1:10 pm
I had a couple thoughts about finding the upper limit of boggle boards. Please let me know if there’s something wrong with this line of thinking!
You could divide the alphabet into 20 consonants and 6 vowels, and for a 4×4 board the total number of possibilities would be the sum of every combination of consonants and vowels, namely:
Total Boards = (20^0 * 6^16) + (20^1 * 6^15) + … + (20^16 + 6^0)
Of course, in English every word must have at least one vowel, so the last term drops out since it represents the all-consonant boards. Therefore, the total is:
Total Boards = 2.8 x 10^20 ~ e^47
At 10,000 boards/second, that would still take almost 900 million years to evaluate. However, at this point one could employ a couple heuristics. For example, by examining the frequency distribution of letters in the English language, it is reasonable to assume that the letters J, X, Q, and Z will not be part of the best Boggle board. If those letters are excluded, then the number of consonants falls to 16, and the new number of boards decreases by a factor of 25, to 1.1 x 10^19 boards (35 million years).
From here, one could go a step further, and remove B, V, and K from the running, to decrease the total to 5.7 x 10^17 boards (1.8 million years). Or, one could examine the letter frequencies again and surmise that the best Boggle board will in all likelihood contain an E. If one E is guaranteed, the total boards changes to:
Total Boards = (20^0 * 6^15) + (20^1 * 6^14) + … + (20^14 + 6^1)
Or, 5.1 x 10^19 (162 million years)
With one E guaranteed and J, X, Q, Z removed, you would have 6.9 x 10^17 boards (2.2 million years), and with one E guaranteed and J, X, Q, Z, B, V and K removed, you would have 4.4 x 10^16 boards (139,000 years).
Of course, rotating a Boggle board 90 degrees will produce a different board with the same score, so by symmetry you could divide all of these totals by 4 (I saw in your posts that you found eight-fold symmetry – how is that?).
Removing or guaranteeing certain letters is pretty unscientific, but taking about a Z or putting in an E is also pretty reasonable. I guess it depends on how thorough one wants to be. Also, even with the most ‘flexibility’ employed (one E, no J, X, Q, Z, B, V or K, and four-fold symmetry), the solution would still take about 35,000 years to solve. But I imagine that’s a bit more palatable than 2 billion years.
– Nathan
danvk.org » Solving Boggle by Taking Option Three said,
August 4, 2009 at 6:19 pm
[...] board, we’d need to show that every other board scores fewer points. In a previous post, I estimated that there were 2^69 possible boggle boards. At 10,000 boards/second, this would take 1.9 billion [...]
John said,
September 1, 2010 at 3:04 pm
Nathan’s comment has a couple errors…
When you have:
Total Boards = (20^0 * 6^16) + (20^1 * 6^15) + … + (20^16 + 6^0)
It should be:
Total Boards = (20 choose 0)(20^0 * 6^16) + (20 choose 1)(20^1 * 6^15) + (20 choose 2)(20^2 * 6^14) … + (20 choose 20)(20^16 + 6^0)
since you need to CHOOSE which of the “n” spots you put your “n” consonants. The total is actually a lot higher.
Also, the eight-fold symmetry comes from reflection, since reflected boards have the same score.