2025.02.13

Boggle Revisited: New Ideas in 2025

Over the past few weeks I’ve revisited a 15-year old project of mine: trying to find the globally optimal Boggle board. In the last post, I recapped the work I did in 2009 to find the globally-optimal 3x3 Boggle board.

In this post, I’ll present a few optimizations I found in 2025 that add up to something like a 10x speed boost over the 2009 approach. Between better algorithms, faster CPUs, and the widespread availability of large cloud machines, I’ve now been able to find the globally-optimal 3x4 Boggle board.

With an impressive 1,651 points, here it is:

S L P
I A E
N T R
D E S

This board is chock full of big words, including REPAINTED and STRAINED.

A real Boggle board, of course, is 4x4 or even 5x5. Sadly, those problems still remain out of reach. The final post in this series will look at ideas on how to tackle them.

The code for this post lives in the GitHub Logo danvk/hybrid-boggle repo.

New ideas in 2025

I explored many new ideas for speeding up Boggle solving, but there were five that panned out:

  1. Play a slightly different version of Boggle where you can find the same word as many times as you like.
  2. Build the “evaluation tree” used to calculate the max/no-mark bound explicitly in memory.
  3. Implement “pivot” and “lift” operations on this tree to synchronize letter choices across subtrees.
  4. Aggressively compress and de-duplicate the evaluation tree.
  5. Use three letter classes instead of four.

Multi-Boggle

Looking at the best 3x3 board:

P E R
L A T
D E S

There are two different ways to find the word LATE:

P E R   P E\R
L-A-T   L-A-T
D E/S   D E S

In regular Boggle you’d only get points for finding LATE once. But for our purposes, this will wind up being a global constraint that’s hard to enforce. Instead, we just give you two points for it. We’ll call this “Multi-Boggle”. The score of a board in Multi-Boggle is always higher than its score in regular Boggle, so it’s still an upper bound.

If there are no repeat letters on the board, then the score is the same as in regular Boggle. In other words, while you can find LATE twice, you still can’t find LATTE because there’s only one T on the board.

In practical terms, this means that we’re going to focus solely on the max/no-mark bound and forget about the sum/union bound. The max/no-mark bound for a concrete board (one with a single letter on each cell) is its Multi-Boggle score.

The Evaluation Tree

Recall from the last post that we started by calculating an upper bound on one large board class:

lnrsy chkmpt lnrsy
aeiou aeiou aeiou
chkmpt lnrsy bdfgjvwxz

and then split it up into five smaller board classes, one with each vowel in the center cell, to reduce the upper bound:

lnrsychkmptlnrsy
aeiouaaeiou
chkmptlnrsybdfgjvwxz
lnrsychkmptlnrsy
aeioueaeiou
chkmptlnrsybdfgjvwxz
...
lnrsychkmptlnrsy
aeiouuaeiou
chkmptlnrsybdfgjvwxz

The fundamental inefficiency in this approach is that it results in an enormous amount of duplicated work. These five board classes have a lot in common. Every word that doesn’t go through the middle cell is identical. It would be nice if we could avoid repeating that work.

Back in 2009, I implemented the max/no-mark upper bound by recursively searching over the board and the dictionary Trie. This was a natural generalization of the way you score a concrete Boggle board. It didn’t use much memory, but it also didn’t leave much room for improvement.

You can visualize a series of recursive function calls as a tree. The key advance in 2025 is to form this tree explicitly in memory. This is more expensive, but it gives us a lot of options to speed up subsequent steps.

Here’s an example of what one of these trees looks like:

Evaluation tree for a 2x3 Boggle board

This is visualizing a 2x3 board class, with the cells numbered 0-5:

 T I    0 3
AE .    1 .
 R .    2 .

The top “ROOT” indicates that you have to start on one of the four cells. “CH” ovals indicate that you have to make a choice on a cell, rectangles indicate what those choices are (A or E?) and double-outlined rectangles indicate complete words. You can read words by following a path down the tree. From the left we have: TAR, TIE, TIER, AIT, RAT, RET, REI. (It’s news to me that AIT is a word!)

To get a bound from a tree, you take the sum of your children on rectangular nodes and the max of your children on oval (choice) nodes. Double-outlined boxes indicate how many points they’re worth. In this case the bound at the root is 3, coming from the top branch.

This is a small tree with 30 nodes, but in practice they can be quite large. The tree for the 3x3 board class we’ve been looking at has 520,947 nodes and the 3x4 and 4x4 trees can be much larger.

I actually tried building these trees in 2009, but I abandoned it because I wasn’t seeing a big enough speedup in subsequent steps (scoring boards with a split cell) to justify the cost of building the tree.

What did I miss in 2009? Sadly, I had a TODO that turned out to be critical: rather than pruning out subtrees that don’t lead to points in a second pass, it’s much faster and results in less memory fragmentation if you do it as you build the tree. A 33% speedup becomes a 2x speedup. Maybe if I’d discovered that in 2009, I would have kept going!

The other discovery was that there’s a more important operation to optimize.

“Pivot” and “Lift” operations

After we found an upper bound on the 3x3 board class, the next operation was to split up the middle cell and consider each of those five (smaller) board classes individually. Now that we have a tree, the question becomes: how do you go from the initial tree to the trees for each of those five board classes?

There’s another way to think about this problem. Why is the max/no-mark bound imprecise? Why doesn’t it get us the score of the best board in the class? Its flaw is that you don’t have make consistent choices across different subtrees. You can see this by zooming in on the “0=t” subtree from the previous graph:

A subtree with choices made in inconsistent orders

The bound on this tree is 3 (sum all the words). On the top branch (with a “T-“ prefix), it makes the most sense to choose “A” for cell 1, so that you can spell “TAR.” But on the bottom branch (with a “TI-“ prefix), it makes more sense to choose “E” so that you can spell “TIE” and “TIER.”

Of course, the cell has to be either A or E. It can’t be both. The problem is that these choices happen far apart in the tree, so they’re not synchronized. If we adjusted the tree so that the first thing you did was make a choice for cell 1, then the subtrees would all be synchronized and the bound would go down:

The same subtree with choice 1 made first

This represents the same tree, except that the choice on cell 1 has been pushed to the left. The bound is now 2, not 3 (you have to pick 1 point from the top branch or 2 points from the bottom branch).

What we need is a “pivot” operation to lift a particular choice node up to the top of the tree. You can work out how to do this for each type of node.

First of all, if a subtree doesn’t involve a choice for cell N, then we don’t have to change it. Easy.

For the other two types of nodes (sum and choice), it helps a lot to draw the lift operation. Here’s a sum node (labeled “ROOT”).

Sum node before lifting

We’d like to pivot the tree so that the choice on cell 1 is at the root. Here’s what that looks like:

Sum node after lifting

The tree has gotten bigger and a little more complicated. That’s typical. We’ll be able to improve this, but more on that in a moment.

Here’s a choice node:

Choice node before lifting

Here’s what it looks like after pivoting the choice on 1 to the root:

Choice node after lifting

Again, the tree has gotten more complex. In particular, notice how the 0=c node with three points has been duplicated. This blowup is the cost of pivoting. The payoff is reduced bounds.

If you lift the choice for the middle cell of the 3x3 board all the way to the top of the tree, you’ll wind up with this:

Top of tree with choice of middle cell first

There’s a choice node with five sum nodes below it, and the bound is lower. Now if you lift another cell to the top, you’ll get two layers of choice nodes with sum nodes below them:

Top of tree with two choices lifted

I’ve rotated the tree because it’s already getting big. As before, the bound is lower. If you keep doing this, you build up a “pyramid” of choice nodes at the top of the tree. If the bound on any of these nodes drops below the highest score we know about, we can prune it out. This is equivalent to the “stop” condition from the 2009 algorithm, it’s just that we’re doing it in tree form.

This “lift” operation is not cheap. The cost of making choices in an unnatural order is that the tree gets larger. Here’s the tree size if you lift all nine cells of the 3x3 board, along with the bound:

Step Nodes Bound
(init) 520,947 9,359
1 702,300 6,979
2 1,315,452 5,334
3 2,527,251 4,069
4 5,158,477 3,047
5 8,395,605 2,318
6 14,889,665 1,774
7 18,719,619 1,373
8 11,205,272 1,037
9 4,143,221 804

The node count goes up slowly, then more rapidly, and then it comes down again as we’re able to prune more subtrees. It increases a lot before it comes back down. For this board, there’s a 36x increase in the number of nodes after 7 lifts. At the end of this, there’s only 428 concrete boards (out of 5,625,000 boards in the class) that we need to test.

Does 4 million nodes seem like a lot for only 428 Boggle boards? It is. There are a few important tweaks we can make to keep the tree small as we lift choices.

Compression and De-duping

Keeping the tree as small as possible is essential for solving Boggle quickly. There are two inefficient patterns we can identify and fix in our trees to keep them compact.

  1. Collapse chains of sum nodes into a single sum node.
  2. Merge sibling choice nodes.

There’s no point in having trees of sum nodes without choice nodes in between them. We may as well add all their points and children to a single, bigger sum node. Here’s a tree we had after lifting through a sum node earlier:

Tree after lifting through sum node

Here’s what it looks like after collapsing sum nodes:

Tree after compressing sum nodes

The points from the 0=a and 0=b nodes have moved into their parents, which lets us delete those two nodes, which makes the tree smaller.

Here are the node counts when you add compression after each pivot:

Step Nodes
(init) 520,947
1 669,156
2 1,054,515
3 1,726,735
4 2,675,250
5 2,620,720
6 1,420,925
7 301,499
8 39,667
9 9,621

These are considerably better. After 7 lifts, we’ve gone from nearly 19 million nodes to a mere 300,000. The maximum increase is now only 5-6x. Compression on its own is a 2-3x speedup.

Remember how we switched to playing “Multi-Boggle” earlier? This is where that change is crucial. If we had to track individual words, we couldn’t collapse sum nodes because they’d each reflect different words, and we’d need to keep track of that for the sum/union bound. But with Multi-Boggle, we’re free to collapse TIE and TIER into a single word that’s worth 2 points.

Here’s a tree where we can merge sibling choice nodes:

Tree with duplicate sibling choice nodes

There are two choices for cell 0 that are siblings under the root node. The first is a choice between 0=a and 0=b (0=c nets zero points). The second is a choice between 0=b and 0=c. There’s no reason we should make those choices independently. They’re the same choice, and the tree should reflect that. We can implement that by merging the trees:

Tree with duplicate sibling choice nodes merged

There are fewer nodes now and (exciting!) the bound has gone down because we’ve synchronized two choices that were previously made independently. Subtree merging can be done recursively.

The “lift” operation expands the tree because it can duplicate nodes or entire subtrees. Another optimization is to de-duplicate these, and make sure we only ever operate on unique nodes. This can be done by computing a hash for each node. Here’s a tree with structurally identical nodes marked in red:

Tree with duplicate nodes marked in red

You can scan over the tree to find the canonical versions of each red subtree. For example, the red “4 CH” in the middle reads “4 CH - 4=r - 5 CH - 5=e (1)”, which is exactly the same as the line right above it.

Step Unique Nodes
(init) 98,453
1 117,602
2 215,121
3 318,088
4 592,339
5 754,947
6 481,449
7 125,277
8 27,125
9 9,613

Whereas compression is more effective at reducing node counts after many lifts, de-duplication is better at reducing (unique) node counts initially and after fewer lifts. Only processing each unique node once can potentially save us a lot of time.

One way to think about this is that it allows us to memoize the pivot operation. Another is that it turns the tree into a DAG, similar to how you can compress a Trie by turning it into a DAWG/DAFSA (Directed Acyclic Word Graph). Visualizing it as a DAG doesn’t work very well — there’s just too many crossing lines.

Use three letter classes instead of four

The net effect of all these changes is that we’re able to “break” difficult board classes much more efficiently. For example, this 3x4 board class:

lnrsy aeiou bdfgjqvwxz
lnrsy aeiou aeiou
chkmpt lnrsy lnrsy
bdfgjqvwxz aeiou aeiou

takes about 14 seconds to break using the 2009 technique but only 3 seconds to break using the tree techniques described above. Lifting five times reduces the bound from 51,639 to 6,695, at which point we can switch over to the 2009 technique.

Compare that with an easier board:

lnrsy lnrsy bdfgjqvwxz
aeiou bdfgjqvwxz bdfgjqvwxz
lnrsy aeiou chkmpt
chkmpt lnrsy bdfgjqvwxz

This takes 0.16 seconds with the 2009 technique and 0.12 seconds with the tree technique. It’s a win, but much less of a win. (The five lifts on this one reduce the bound from 8,138 to 1,231.)

If trees are most helpful on hard board classes, maybe we should have more of them? If we use use three letter buckets instead of four, it significantly reduces the number of board classes we need to consider:

  • Four letter buckets: 4^12/4 ≈ 4.2M boards classes
  • Three letter buckets: 3^12/4 ≈ 133k board classes

The board classes with three letter buckets are going to be bigger and harder to break. But with our new tools, these are exactly the sort of boards on which we get the biggest improvement. So long as the average breaking time doesn’t go up by more than a factor of ~32x (4.2M/133k), using three buckets will be a win.

Code Year Buckets Pace (s/class)
2009 4 4.53
2009 3 33.336 (7.4x slower)
2025 4 1.175
2025 3 5.321 (4.5x slower)

So three buckets would have been better with the 2009 algorithm, but it’s an even bigger win in 2025. Since 32 / 4.5 ≈ 7, we’d expect this to be around a 7x speedup.

Why not keep going to two classes, or even just one? The cost is memory and reduced parallelism. “Chunkier” board classes require bigger trees, and RAM is a finite resource. Moreover, the fewer letter buckets we use, the more skewed the distribution of breaking times gets. Some board classes remain trivial to break (ones with all consonants, for example), but others are real beasts. On the full 3x4 run, the fastest board was broken in 0.003s whereas the slowest took 2297s. It’s harder to distribute these uneven tasks evenly across many cores or many machines to get the full speedup you expect from distribution. I think using slightly bigger chunks could still help, say two classes (consonant/vowel) in the corners, three on the edges and five in the middle.

Putting it all together

For each board class, the 2025 algorithm is:

  1. Build the evaluation tree.
  2. “Lift” a few choice cells to the top.
  3. Continue splitting cells without modifying the tree, ala the 2009 approach.

The right number of “lifts” depends on the board and the amount of memory you have available. Harder boards benefit from more lifts, but this takes more memory.

Using this approach, I was able to evaluate all the 3x4 Boggle board classes on a 192-node C4 cloud instance in 8–9 hours, roughly $100 of compute time. The results? There are exactly five boards that score more than 1600 points with the ENABLE2K word list:

  • srepetaldnis (1651)
  • srepetaldnic (1614)
  • srepetaldnib (1613)
  • sresetaldnib (1607)
  • sresetaldnip (1607)

The best one is the same one I found through simulated annealing. The others are 1-2 character variations on it. It would have been more exciting if there were a new, never before seen board. But we shouldn’t be too surprised that simulated annealing found the global optimum. After all, it did for 3x3 Boggle. And Boggle is “smooth” in the sense that similar boards tend to have similar scores. It would be hard for a great board to “hide” far away from any other good boards.

Next Steps

This is an exciting result! After 15 years, it is meaningful progress towards the goal of finding the globally-optimal 4x4 Boggle board.

There are still many optimizations that could be made. My 2025 code only wound up being a 3-4x speedup vs. my 2009 code when I ran it on the 192-core machine. This was because I had to dial back a few of the optimizations because I kept running out of memory. So changes that reduce memory usage would likely be the most impactful.

On the other hand, I don’t think there’s any tweaks to my current approach that will yield more than a 10x performance improvement. So while I might be able to break 3x4 Boggle more efficiently, it’s not going to make a big dent in 4x4 Boggle. Remember that 50,000x increase in difficulty from earlier. For 4x4 Boggle, we still need a different approach. (Or $500,000 of compute time!)

In the next and final post, I’ll talk about a few ideas that might help to finally crack 4x4 Boggle. I’ll also share some thoughts on pybind11, picking up an old project, and the experience of working on a hard optimization problem.

You can find all the code for this post in the GitHub Logo danvk/hybrid-boggle repo.

Please leave comments! It's what makes writing worthwhile.

comments powered by Disqus