02.11.07

Comments should be working again

Posted in meta at 5:48 pm by danvk

Turns out my Comment CAPTCHA wasn’t so smart after all. It stopped comment spam by stopping all comments. I’ve switched over to Akismet, so hopefully the comments should be working again.

02.10.07

One Last Boggle Boost

Posted in boggle, programming at 10:24 pm by danvk

Last time, we used Tries to get a 20x speedup of an already-fast Boggle program. This time, we’ll ferret out the largest remaining bottleneck to produce an even faster Boggler.

What are the main operations that we perform in solving a Boggle board? There’s 1) word lookup, 2) checking if we’ve previously found a word, 3) checking if a letter starts a word and 4) recursion. Here’s how much time they take:

Operation Complexity
Word Lookup Memory access (Trie::CompletesWord)
Previously found? Hash lookup/insertion
Starts a word? Memory access (Trie::StartsWord)
Recursion Subroutine call

We’re not going to do better than a memory access or a subroutine call. But who knows how long a hash lookup/insert takes? It’s O(1), but who knows what that constant factor is? It’s certainly more time than a memory access. Let’s see if we can replace that hash lookup/insert with another memory access. We’ll go through a couple of ideas.

Trie annotation
The Trie reduces CompletesWord and StartsWord to memory lookups by storing their results in each node (the is_word and children properties). So why not add another boolean property, found, to each Trie node.

This will clearly reduce the check for whether we’ve found a word to a memory access. The problem is that, if we’ve found a word on one board, we haven’t necessarily found it on the next. So when we’re done solving a board, we need to go back and clean up the Trie. That requires another data structure to track which Trie nodes we’ve marked and an extra loop to reset their found property after every board. That’s at least four memory accesses, not just one.

What’s more, this approach destroys the thread safety of the Trie. Since the Boggler can write to the Trie as well as read it, multiple solvers could easily mess each other up.

This approach works fairly well in practice, but it’s more complex than it seems at first.

Bit Vector
This is inspired by the thread safety argument against Trie annotation. Instead of storing the found bit in the Trie, why not store it somewhere else? Each thread will have a store, so thread safety isn’t an issue. This approach isn’t so different than the hash table approach we used last time, except that it uses a bit vector so it’s faster but uses more memory.

This approach has many drawbacks. First of all, how do you map pointer addresses to locations in a bit vector? There’s a lot of places that a pointer can point to, and there needs to be a bit in the bit vector for all of them. In practice, we’d measure the extent of the Trie in memory and adjust for memory alignment and the number of bits in a word. Unfortunately, you still wind up with a bit vector that’s not that much smaller than the Trie itself.

The bit vector has the same clearing problem after the board has been solved. Using bzero to clear the whole vector every time has a dramatic effect on perf, so you wind up using an extra array to track which bits you’ve set, just like the annotation strategy. This is way more complex than the first idea, but it is thread-safe.

Annotation revisited
This approach took me forever to come up with, but I think it’s the best.

Instead of annotating the Trie with a boolean found value, we can annotate it with a full 32-bit integer. The Boggler keeps track of how many boards it’s solved and stores that value in the Trie node of each word it finds. When it finds a word again, it compares the found value in that node against the board number. If they’re the same, that word has been found on this board. Otherwise it hasn’t. The beauty of this approach is that we don’t need to clean up after ourselves. It’s fine to leave a mess in the Trie, because that mess will never equal the next board number.

We’ll have to do a complete sweep of the Trie to clean out the board numbers when we overflow the board counter, but that happens only once every 2^32 = 4 billion boards, so it’s not a serious concern.

This approach is the simplest, since we don’t need any auxiliary data structures. It’s also the fastest, since we don’t have to clean up after ourselves. But it’s not thread-safe, which is its only major drawback.

The code
Here’s the new stuff in trie.h:

inline void Mark(unsigned mark) { mark_ = mark; }
inline unsigned Mark() const { return mark_; }
unsigned mark_;

We need to add a variable to count the number of boards we’ve processed in boggler.h:

unsigned num_boards_;

And here’s the goodies from boggler.cc:

void ClearMarks(Trie* t) {             // NEW
	t->Mark(0);                        // NEW
	for (int i=0; i<kNumLetters; i++)  // NEW
		if (t->StartsWord(i))          // NEW
			ClearMarks(t->Descend(i)); // NEW
}
// Returns the score from this portion of the search
void Boggler::Solve(int x, int y, int len, Trie* t) {
	prev_[x][y] = true;

	len += (bd_[x][y]=='q'-'a' ? 2 : 1);
	if (t->IsWord()) {
		if (t->Mark() != num_boards_) { // NEW
			score_ += kWordScores[len]; // NEW
			t->Mark(num_boards_);       // NEW
		}
	}

	for (int dx=-1; dx<=1; ++dx) {
		int cx = x + dx;
		if (cx<0 || cx>3) continue;
		for (int dy=-1; dy<=1; ++dy) {
			int cy = y + dy;
			if (cy<0 || cy>3) continue;
			if (prev_[cx][cy]) continue;
			if (t->StartsWord(bd_[cx][cy]))
				Solve(cx, cy, len, t->Descend(bd_[cx][cy]));
		}
	}
	
	prev_[x][y] = false;
}
int Boggler::Score(const char* letters) {
	if (!ParseBoard(letters)) return -1;

	score_ = 0;
	for (int i=0; i<16; ++i)
		Solve(i/4, i%4, 0, dict_->Descend(bd_[i/4][i%4]));

	num_boards_++;         // NEW
	if (!num_boards_) {    // NEW
		ClearMarks(dict_); // NEW
		num_boards_ = 1;   // NEW
	}
	return score_;
}

That’s about a 15 line change total. What does it buy us?

$ ./perf
Loaded 172203 words.
Testing perf on 50000 boards...
50000 boards in 4.82284 secs => 10367.35 bds/sec
Average board score: 141.1734 pts/bd

That’s another 65% speedup from last time. It’s certainly not 20x, but impressive nonetheless.

Now that we’ve got a lightning-fast Boggle solver we can solving interesting problems with it. Until next time, here’s the source.

To recap, here’s how far we’ve come:

Program Speed Change Improvement
Original .004 b/s - Really simple Ruby
Stems 17.5 b/s 4000x Used stems to prune the DFS
C++ 310 b/s 18x Converted Ruby to C++
Tries 6271 b/s 20x Use Tries instead of hashes
Final 10367 b/s 65% Annotate the Trie with board counter

That’s a factor of six million improvement since our first try. It’s interesting to note that the final 600x of improvements all came from replacing O(1) operations with other O(1) operations.

02.06.07

In which we discover that Tries are awesome

Posted in boggle, programming at 2:29 am by danvk

This is the sixth installment in an ongoing series about Boggle. See also 1, 2, 3, 4 and 5.

We’ve talked before about why Tries are the perfect data structure for solving Boggle boards. Today we’ll adapt the C++ program from yesterday’s post to use them.

Here’s a picture of part of a Trie:

bubbles-lines.png

There are three important operations that we can perform quickly on a Trie node:

  1. Check whether the current node completes a word
  2. Check whether a particular character leads to more words
  3. Descend to a child node

So, in addition to some convenience functions, our Trie interface will need to support these operations. Here’s what I came up with:

const int kNumLetters = 26;

class Trie {
  public:
	Trie();
	virtual ~Trie();
	
	// FAST access functions
	inline bool IsWord() { return is_word_; }
	inline bool StartsWord(int i) { return children_[i]; }
	inline Trie* Descend(int i) { return children_[i]; }
	
	// Slower utility functions
	void AddWord(const char* wd);
	int LoadFile(const char* filename);
	bool IsWord(const char* wd) const; // Look up a full word
	unsigned int size() const;         // Total # of words
	
  private:
	Trie* children_[kNumLetters];
	bool is_word_;
};

As you can see, the big three operations are all fast inline accessors. There’s one nuance worth talking about here: why do StartsWord and Descend take integers? In pictures of Tries, the edges are usually labeled with strings, or at least characters. This is partially a performance optimization: by mapping ‘a’..’z’ to 0..25, the program can avoid lots of annoying additions and subtractions. It also forces us to treat the ‘qu’ die specially. From now on, we’ll consider the letter ‘q’ to stand in for “qu”. We’ll have to be careful to filter words containing a ‘q’ not followed by a ‘u’ from the dictionary to make this work. We’ll also change our input interface to reflect this. Since each cell can be represented with exactly one character, there’s no reason to separate them with spaces anymore. Where we used to write “a b c d e f g h i j k l m n o p”, we’ll write “abcdefghijklmnop”. This greatly simplifies the board parsing code. I won’t go into any details, but the implementation can be found in trie.cc.

Now for the Boggler class. We’ll try to avoid changing the public interface. The only change will be the format of the boards, from “a b c …” to “abc…” The Boggler will need a Trie to contain its dictionary, but this allows us to remove the words_ and stems_ hashes. Finally, we’ll need some way to remember which words we’ve found. A hash will do fine here, but there’s some trickiness which we’ll get to below:

#include <ext/hash_set>
#include <inttypes.h>      // for uintptr_t
using namespace __gnu_cxx; // for hash_set
class Trie;

class Boggler {
  public:
	Boggler();
    int LoadDictionary(const char* filename);
	int Score(const char* bd);
	
  private:
	bool ParseBoard(const char* lets);
	void Solve(int x, int y, int len, Trie* t);
	
	hash_set<uintptr_t> found_;
	int bd_[4][4];
 	bool prev_[4][4];
	int score_;
	Trie* dict_;
};

So, what the hell is a uintptr_t? We’ll get to that in a sec. What’s more interesting is that the type signature of the recursive Solve method has changed. We no longer know what letters are in the word we’ve found so far, only its length. This is all we need to know for scoring, but it’s a little disconcerting. If we don’t know the characters in the word we’ve found, how can we tell whether we’ve found it before?

The solution is in the Trie. There’s a one-to-one correspondence between nodes in the Trie and words in the dictionary. If we’ve found a word before, then we’ve necessarily been to this Trie node before. So instead of storing the whole word in a hash, we can just store a pointer to the Trie node in its place. This is exactly the kind of dirty hack that I mentioned last time. You can do this in C/C++, but Ruby makes it rather difficult.

That leads us to uintptr_t. The STL won’t just let us put any old pointer into a hash_set. We need to convert it to something that the STL knows how to hash. On a 32-bit system, we could just use an unsigned int or long. But that won’t always work. The uintptr_t data type is an unsigned long that’s guaranteed to have the same number of bits as a pointer. So there we go.

Maybe a picture will clarify things. This shows the Boggle board on the left with all four arguments to the Solve function at each node. On the right is a part of the Trie, annotated with memory addresses. This may have also been an excuse to play with Photoshop..

triegraph.png

Here’s the juicy bits of the implementation:

// Returns the score from this portion of the search
void Boggler::Solve(int x, int y, int len, Trie* t) {
	prev_[x][y] = true;

	len += (bd_[x][y]=='q'-'a' ? 2 : 1);
	if (t->IsWord()) {
		// Convert the Trie node to an int and look it up
		uintptr_t hash = reinterpret_cast<uintptr_t>(t);
		if (found_.find(hash) == found_.end()) {
			score_ += kWordScores[len];
			found_.insert(hash);
		}
	}

	// Recur in each possible direction
	for (int dx=-1; dx<=1; ++dx) {
		int cx = x + dx;
		if (cx<0 || cx>3) continue;
		for (int dy=-1; dy<=1; ++dy) {
			int cy = y + dy;
			if (cy<0 || cy>3) continue;
			if (prev_[cx][cy]) continue;
			if (t->StartsWord(bd_[cx][cy]))
				Solve(cx, cy, len, t->Descend(bd_[cx][cy]));
		}
	}
	
	prev_[x][y] = false;
}

int Boggler::Score(const char* letters) {
	if (!ParseBoard(letters)) return -1;

	score_ = 0;
	for (int i=0; i<16; ++i)
		Solve(i/4, i%4, 0, dict_->Descend(bd_[i/4][i%4]));
	found_.clear();
	return score_;
}

// Board format: "bcdefghijklmnopq"
bool Boggler::ParseBoard(const char* lets) {
	for (int i=0; i<16; i++) {
		if (!lets[i] || lets[i]<'a' || lets[i]>'z')
		  return false;
		bd_[i/4][i%4] = lets[i] - 'a';
	}
	return true;
}

Now the moment of truth… what kind of performance do we get out of this? I wrapped a small binary called perf that we’ll be using to measure this from now on. It solves 50,000 pre-defined boards and reports on the time used:

$ ./perf
Loaded 172203 words.
Testing perf on 50000 boards...
50000 boards in 7.97206 secs => 6271.91 bds/sec
Average board score: 141.1734 pts/bd

That’s another 20x improvement over the previous post’s Boggler class!

Fast as this board is, there’s still one major, unnecessary bottleneck left in it. Any guesses? We’ll fix it tomorrow, at which point we’ll have a Boggler that’s about as fast as I know how to make it. Then we can start doing the fun stuff with it.

Complete source code for today’s Boggler can be found here.

02.04.07

Moving Boggle to C++

Posted in boggle, programming at 11:55 pm by danvk

Last time, we talked about Tries, the perfect data structure for Boggle. To take full advantage of Tries, we need to move to a lower-level language than Ruby, which leads to today’s task: convert our Ruby code to C++.

Premature optimization is the root of all evil, so our main goal here is to get a working C++ Boggle solver. We’ll use STL strings and hash tables to produce code that’s as close to the last Ruby program as possible. That being said, I’m going to take this opportunity to extract the Boggle-solving code into its own class. Here’s the interface I came up with:

#include <ext/hash_set>
#include <string>
using namespace __gnu_cxx; // for hash_set
using namespace std;       // for string

class Boggler {
 public:
  Boggler();
  int LoadDictionary(const char* filename);
  int Score(const char* letters);
  	
 private:
  bool ParseBoard(const char* lets);
  void Solve(int x, int y, const string& sofar);
  
  hash_set<string> words_;
  hash_set<string> stems_;
  hash_set<string> found_;
  string bd_[4][4];
  bool prev_[4][4];
  int score_;
};

The routines mostly do what their names imply. LoadDictionary populates the words_ and stems_ hashes based on a word list. ParseBoard takes input like “a b c d e f g h i j k l m n o p” and populates the bd_ array. Score parses and scores a board, while Solve is its recursive workhorse.

Before getting to the implementation of this interface, let’s look at a few programs that use it. Here’s one that tests some boards with known scores (try the online boggle solver to test them):

#include "boggler.h"

#include <string>
#include <iostream>
using namespace std;

int main(int argc, char** argv) {
	Boggler b;
	int words = b.LoadDictionary("words");
	cout << "Loaded " << words << " words." << endl;
	cout << "b.Score(abcd...) => " << b.Score("a b c d e f g h i j k l m n o p") << " == 18?" << endl;
	cout << "b.Score(qu...) => " << b.Score("t c e e v w h b t s t u qu a a e") << " == 94?" << endl;
	cout << "b.Score(catd...) => " << b.Score("c a t d l i n e m a r o p e t s") << " == 2338?" << endl;
}

This program is just a quick sanity check. If the Boggler correctly scores those three boards, then it’s doing a whole lot right. Here’s another program that reads boards from standard input and scores each of them. We can use it to test correctness and measure performance:

#include <iostream>
using namespace std;

#include "boggler.h"

int main(int argc, char** argv) {
	Boggler b;
	b.LoadDictionary("words");

	char linebuf[256];
	while (cin.getline(linebuf, 256)) {
		cout << b.Score(linebuf) << endl;
	}
}

Now for the implementation, boggle.cc. The interface requires four functions: LoadDictionary, ParseBoard, Score and Solve. The last is the most interesting, and it can be translated directly from Ruby:

def solve(x, y, sofar)
  Prev[x][y]=true

  wd = sofar + Bd[x][y]
  Found[wd]=true if Words[wd]
  
  Dirs.each do |dx,dy|
    next unless (0..3)===(x+dx)
    next unless (0..3)===(y+dy)
    next if Prev[x+dx][y+dy]
    solve(x+dx,y+dy,wd) if Stems[wd]
  end
  
  Prev[x][y] = false
end
void Boggler::Solve(int x, int y, const string& sofar) {
	prev_[x][y] = true;
	
	string wd = sofar + bd_[x][y];
	if (words_.find(wd) != words_.end()) {
		if (found_.find(wd) == found_.end()) {
			found_.insert(wd);
			score_ += kWordScores[wd.size()];
		}
	}
	
	for (int dx=-1; dx<=1; ++dx) {
		int cx = x + dx;
		if (cx<0 || cx>3) continue;
		for (int dy=-1; dy<=1; ++dy) {
			int cy = y + dy;
			if (cy<0 || cy>3) continue;
			if (prev_[cx][cy]) continue;
			if (stems_.find(wd) != stems_.end())
				Solve(cx, cy, wd);
		}
	}
	
	prev_[x][y] = false;
}

The STL syntax for looking up an element in a hash_set is hideous: words_.find(wd) != words_.end()? I can’t understand why there’s no equivalent of Ruby’s has_key? function. What else do you do with hashes? Here’s Score:

int Boggler::Score(const char* letters) {
	if (!ParseBoard(letters)) return -1;

	score_ = 0;
	for (int i=0; i<16; ++i)
		Solve(i/4, i%4, "");
	found_.clear();
	return score_;
}

Simple enough. The other two functions require some STL-fu, but aren’t especially interesting. You can check out the complete source and Makefile from here.

First the sanity check:

$ make
g++  -g -Wall -O3 -march=i686  -c -o test.o test.cc
g++  -g -Wall -O3 -march=i686  -c -o boggler.o boggler.cc
g++   test.o boggler.o   -o test
g++  -g -Wall -O3 -march=i686  -c -o scorer.o scorer.cc
g++   scorer.o boggler.o   -o scorer
$ ./test
Loaded 172233 words.
b.Score(abcd...) => 18 == 18?
b.Score(qu...) => 94 == 94?
b.Score(catd...) => 2338 == 2338?

So we’ve got correctness. Now the big question: how’s the performance? We’ll get a baseline by scoring zero boards with scorer and then feed it 5,000 boards from a list of 50,000 that I created:

$ time echo '' | ./scorer

-1

real    0m1.431s
user    0m1.294s
sys     0m0.052s
$ time head -5000 test-boards | ./scorer > /dev/null

real    0m17.525s
user    0m17.412s
sys     0m0.107s
$ echo "5000/(17.412-1.294)" | bc -q
310.212185134632088

That’s an 18x improvement over last time, exclusively from moving to C++. There is a Ruby penalty, and 18x is a perfectly believable value for it. I suspect that, if Ruby had a VM, (as it will soon and kind of does already) then this factor could be entirely eliminated. What can’t be eliminated is where we’ll go next. There are some times when it really is worth getting your hands dirty hacking around for performance tweaks. C++ makes those hacks possible, whereas Ruby does not. A simple example: rather than doing all the string addition in Boggler::Solve, we could use a pre-allocated character buffer. I could easily see this gaining a 2-3x performance improvement. But you can’t easily break the string abstraction in Ruby.

Next time we’ll add Tries to the C++ program. You can get the full source for today’s Boggle from here.

02.02.07

Adium 1.0 is out!

Posted in Uncategorized at 10:50 pm by danvk

Go pick up your copy of Adium 1.0, the new release of the greatest IM program ever. It’s Mac-only and it’s beautiful. I’ve been using it with only one complaint for at least 4-5 years (yes, it’s been in beta that long). That one annoyance was non-functioning AIM file transfers, which they claim to have fixed. We’ll see, they’ve claimed that before. Here’s what it looks like:

adium.png

I’m using the “Concise” layout with the “Decay 2.0″ color scheme. I enlarged the font from 9->10 pt, reduced the spacing to 3px, and added a 6px gutter on the left. I think it looks good, very compact and understated, but quite clear and useable.

Update: Despite looking great, file transfer is still AWOL. Shoot.

« Previous Page« Previous entries Next entries »Next Page »