02.11.07

Some Boggle Statistics

Posted in boggle at 10:46 pm by danvk

With a fast boggle solver in hand, it’s time for some fun statistics. These are all based on boggle boards rolled with real boggle dice. I’m going sans-code this time, but if you’re interested in seeing it, feel free to holla.

Most common words:

3 letters   4 letters   5 letters
Word Freq (%)   Word Freq (%)   Word Freq (%)
toe 19.258   teen 6.718   eaten 2.034
tee 19.074   tees 6.564   enate 2
ten 17.944   tent 6.02   sente 1.954
net 17.944   note 5.976   setae 1.944
tea 17.65   tone 5.838   tense 1.86
set 17.51   teat 5.804   tease 1.856
eta 17.176   toes 5.664   teeth 1.788
ate 17.176   toea 5.548   eater 1.788
tae 16.518   nets 5.432   teens 1.712
eat 16.518   test 5.344   seton 1.702
tie 16.432   rete 5.208   notes 1.702
het 15.684   nett 5.204   tents 1.646
ret 15.108   nest 5.174   retie 1.632
eth 14.938   tens 5.172   steno 1.624
oes 14.698   sent 5.156   sheet 1.618
the 14.542   neat 5.146   ester 1.618
eon 14.474   etna 5.144   oaten 1.61
one 14.366   ante 5.144   teats 1.608
ose 13.82   thee 5.064   tones 1.606
see 13.78   tote 5.052   enter 1.596

I looked these words up and they all check out. See the Scrabble dictionary if you’re not convinced.

How many words can we expect to find on each board?

words.png

That looks like a log-normal distribution. The mean is 98.53 words. How many points?

scores.png

That’s also a log-normal distribution with the characteristically long tail. The mean is 140.97 points per board.

How many words of each length can we expect to find on a board? Here’s a histogram of the number of words of each length on a board:

lens.png

Those also look like log-normals, with four letter words being most common.

Put another way, what’s the likelihood of finding a word of a given length on a board?

Len. Likelihood
3 99.97994%
4 99.901%
5 98.62%
6 87.56%
7 56.21%
8 21.36%
9 3.94%
10 0.442%
11 0.0362%
12 0.00228%
13 0.0001%

For context, the longest word I’ve ever found in a game was “thrashers” at nine letters.

The most common words were based on a 50,000 board sample. The graphs are based on a 5,000,000 board sample. Feel free to contact me if you’d like source or the Excel spreadsheet.

1 Comment

  1. Exile from GROGGS said,

    October 27, 2010 at 1:20 pm

    Interesting. I’m approaching this from the other direction – attempting to calculate some answers, following the observation (playing Scramble) that words seem to crop up on successive boards with surprising regularity, but the first post I’m writing is using these results as they stand. Thanks!