02.11.07
Some Boggle Statistics
With a fast boggle solver in hand, it’s time for some fun statistics. These are all based on boggle boards rolled with real boggle dice. I’m going sans-code this time, but if you’re interested in seeing it, feel free to holla.
Most common words:
3 letters | 4 letters | 5 letters | |||||
---|---|---|---|---|---|---|---|
Word | Freq (%) | Word | Freq (%) | Word | Freq (%) | ||
toe | 19.258 | teen | 6.718 | eaten | 2.034 | ||
tee | 19.074 | tees | 6.564 | enate | 2 | ||
ten | 17.944 | tent | 6.02 | sente | 1.954 | ||
net | 17.944 | note | 5.976 | setae | 1.944 | ||
tea | 17.65 | tone | 5.838 | tense | 1.86 | ||
set | 17.51 | teat | 5.804 | tease | 1.856 | ||
eta | 17.176 | toes | 5.664 | teeth | 1.788 | ||
ate | 17.176 | toea | 5.548 | eater | 1.788 | ||
tae | 16.518 | nets | 5.432 | teens | 1.712 | ||
eat | 16.518 | test | 5.344 | seton | 1.702 | ||
tie | 16.432 | rete | 5.208 | notes | 1.702 | ||
het | 15.684 | nett | 5.204 | tents | 1.646 | ||
ret | 15.108 | nest | 5.174 | retie | 1.632 | ||
eth | 14.938 | tens | 5.172 | steno | 1.624 | ||
oes | 14.698 | sent | 5.156 | sheet | 1.618 | ||
the | 14.542 | neat | 5.146 | ester | 1.618 | ||
eon | 14.474 | etna | 5.144 | oaten | 1.61 | ||
one | 14.366 | ante | 5.144 | teats | 1.608 | ||
ose | 13.82 | thee | 5.064 | tones | 1.606 | ||
see | 13.78 | tote | 5.052 | enter | 1.596 |
I looked these words up and they all check out. See the Scrabble dictionary if you’re not convinced.
How many words can we expect to find on each board?
That looks like a log-normal distribution. The mean is 98.53 words. How many points?
That’s also a log-normal distribution with the characteristically long tail. The mean is 140.97 points per board.
How many words of each length can we expect to find on a board? Here’s a histogram of the number of words of each length on a board:
Those also look like log-normals, with four letter words being most common.
Put another way, what’s the likelihood of finding a word of a given length on a board?
Len. | Likelihood |
---|---|
3 | 99.97994% |
4 | 99.901% |
5 | 98.62% |
6 | 87.56% |
7 | 56.21% |
8 | 21.36% |
9 | 3.94% |
10 | 0.442% |
11 | 0.0362% |
12 | 0.00228% |
13 | 0.0001% |
For context, the longest word I’ve ever found in a game was “thrashers” at nine letters.
The most common words were based on a 50,000 board sample. The graphs are based on a 5,000,000 board sample. Feel free to contact me if you’d like source or the Excel spreadsheet.
Exile from GROGGS said,
October 27, 2010 at 1:20 pm
Interesting. I’m approaching this from the other direction – attempting to calculate some answers, following the observation (playing Scramble) that words seem to crop up on successive boards with surprising regularity, but the first post I’m writing is using these results as they stand. Thanks!