01.30.07
Boggle #3: Succeed by not Being Stupid
Last time, we wrote a program that found all the words on a Boggle board in just over four minutes per board. But what was it spending all that time doing? Let’s add some debugging output to find out:
def solve(x, y, sofar)
puts "#{x},#{y}: ‘#{sofar}’"
…
Here’s what happens:
$ ruby fast.rb a b c d e f g h i j k l m n o p 0,0: '' 0,1: 'a' 0,2: 'ab' 0,3: 'abc' 1,2: 'abcd' 1,1: 'abcdg' 1,0: 'abcdgf' 2,0: 'abcdgfe' 2,1: 'abcdgfei' 2,2: 'abcdgfeij' 1,3: 'abcdgfeijk' 2,3: 'abcdgfeijkh' 3,2: 'abcdgfeijkhl' 3,1: 'abcdgfeijkhlo' 3,0: 'abcdgfeijkhlon' 3,3: 'abcdgfeijkhlo' 3,3: 'abcdgfeijkhl' 3,2: 'abcdgfeijkhlp' 3,1: 'abcdgfeijkhlpo' 3,0: 'abcdgfeijkhlpon' ...
Quick: how many words start with “abcd”? I can’t think of any either. But our program dutifully searches over the entire board for words we know it won’t find. It’s painful to think about all the time our search is wasting exploring paths that can’t possibly start real words. We need to cut off paths like “abcd” at the head. We can do this by creating a list of word stems: sequences of letters that can start legal Boggle words. “a” is on that list. So is “ab”. “abc” probably isn’t, though, and “abcd” certainly isn’t.
We may as well store the list of stems in another hash table. Here’s how:
Words = Hash.new Stems = Hash.new IO.foreach("words") do |line| line =~ /^([a-z]{3,17})\b/ or next wd = $1 Words[wd] = true (1..wd.length-1).each do |len| Stems[wd[0,len]] = true end end
Now we modify the solve routine to only make a recursive call when it could result in new words. This is a trivial change:
solve(x+dx,y+dy,wd) if Stems[wd] # keep going if it'll help!
Now the moment of truth. How much of a win did these four new lines of code buy us?
$ time ruby fast.rb a b c d e f g h i j k l m n o p 18 real 0m7.831s user 0m7.651s sys 0m0.096s $ time ruby fast.rb c a t d l i n e m a r o p e t s 2338 real 0m7.990s user 0m7.818s sys 0m0.094s
WOW! That’s a factor of 30 improvement! If we dig a little deeper, it gets even more interesting:
$ ruby -r profile fast.rb c a t d l i n e m a r o p e t s 2338 % cumulative self self total time seconds seconds calls ms/call ms/call name 40.19 213.80 213.80 172235 1.24 2.62 Range#each 11.24 273.58 59.78 1 59780.00 401400.00 IO#foreach 11.05 332.36 58.78 1554484 0.04 0.04 Hash#[]= 10.96 390.64 58.28 8892 6.55 84.27 Array#each 10.43 446.13 55.49 1381345 0.04 0.04 String#[] 4.25 468.76 22.63 130010 0.17 0.30 Range#=== 2.99 484.64 15.88 247385 0.06 0.06 Fixnum#<=> 2.41 497.44 12.80 244423 0.05 0.05 Fixnum#+ 1.40 504.87 7.43 132259 0.06 0.06 Array#[] 1.37 512.18 7.31 8891 0.82 85.58 Object#solve
The first three methods are called only in dictionary-loading code we just wrote. The vast majority of time is being spent loading the dictionary! There’s not much we can do to improve this, but we can make it less important by changing the problem we’re trying to solve.
Right now we’re just trying to score one board. But if we knew in advance that we were going to score 10 boards, we could load the dictionary once and then score all ten boards using that dictionary. This would mitigate the slow dictionary load. Here’s one way to do it. Input is now taken from STDIN:
# Take whitespace-separated inputs on each line ARGF.each do |line| Found.clear letters = line.chomp.split(' ') (0..15).each { |i| Bd[i/4][i%4] = letters[i] } (0..15).each { |i| solve(i/4, i%4, "") } puts Found.keys.inject(0) { |sum,w| sum + Scores[w.length] } end
There are a few Rubyisms we haven’t seen yet in those lines. ARGF
is a great crutch for Perl programmers who miss typing while(<>) {...}
. It opens each input file left in ARGV
and yields each line. If there’s no input files left, it reads STDIN and yields each line it gets there. Many, many programs do their work in an ARGF loop.
The chomp
method in line.chomp
removes a trailing newline from line
, if it’s there. The split(' ')
is a special case of the split method that splits the line on whitespace into an array. These are also Perlisms. Come to think of it, most of the strange, idiosyncratic features of Ruby are ultimately inspired by Perl.
I generated a file with ten random boards in it:
c a t d l i n e m a r o p e t s a b c d e f g h i j k l m n o p s i e e u e o o c t r k x o n n c n s r e e h m i o r t o i k y t v b i t n p u e e o t n t r e y f h a s e g i o n m o d t a e e e n o l a o s t i a s v m e l s t n t e e n t a e e o c p o b a s y r p h u v i e r e e u p o s qu n g t f y a t b e w r e t e
And now the performance:
$ time ruby multi.rb boards 2338 18 138 121 190 212 272 211 138 87 real 0m8.454s user 0m8.264s sys 0m0.098s
It took 7.75 seconds to process one board earlier in this post, so that means that the solver itself is running at a rate of (8.264-7.75)/9 = 0.057 secs/board. This is fast enough that we need to start inverting our units. We’re getting 17.5 boards/sec, and we’ve improved by a factor of nearly 5000 since our initial attempt!
Steven said,
April 18, 2008 at 6:09 am
Sorry your code doesn’t seem to work anymore
Steven said,
April 18, 2008 at 6:21 am
OK it does, but why do I need to put spaces in between the letters?
danvk.org » Breaking 3×3 Boggle said,
August 8, 2009 at 1:13 pm
[...] from trying each letter on each square. But we’re saved by the same lesson we learned in boggle post #3: the dictionary is exceptionally effective at pruning thorny search trees. If we prune search trees [...]