12.26.09
Crossword Word Frequency
In a previous post, I discussed downloading several years’ worth of New York Times Crosswords and categorizing them by day of week. Now, some analysis!
Here were the most common words over the last 12 years, along with the percentage of puzzles in which they occurred:
Percentage | Word | Length |
---|---|---|
6.218% | ERA | 3 |
5.703% | AREA | 4 |
5.413% | ERE | 3 |
5.055% | ELI | 3 |
4.854% | ONE | 3 |
4.585% | ALE | 3 |
4.496% | ORE | 3 |
4.361% | ERIE | 4 |
4.339% | ALOE | 4 |
4.317% | ETA | 3 |
4.317% | ALI | 3 |
4.227% | OLE | 3 |
4.205% | ARE | 3 |
4.138% | ESS | 3 |
4.138% | EDEN | 4 |
4.138% | ATE | 3 |
4.048% | IRE | 3 |
4.048% | ARIA | 4 |
4.004% | ANTE | 4 |
3.936% | ESE | 3 |
3.936% | ENE | 3 |
3.914% | ADO | 3 |
3.869% | ELSE | 4 |
3.825% | NEE | 3 |
3.758% | ACE | 3 |
(you can click column headings to sort.)
So “ERA” appears, on average, in about 23 puzzles per year. How about if we break this down by day of week? Follow me past the fold…
Monday:
Percentage | Word | Length |
---|---|---|
9.404% | ALOE | 4 |
8.777% | AREA | 4 |
7.837% | ERIE | 4 |
6.426% | ONE | 3 |
6.426% | IDEA | 4 |
6.426% | ARIA | 4 |
6.270% | ONCE | 4 |
6.270% | EDEN | 4 |
6.113% | ERA | 3 |
6.113% | ELSE | 4 |
6.113% | ASEA | 4 |
5.799% | ERE | 3 |
5.643% | ORE | 3 |
5.643% | ETAL | 4 |
5.643% | ARE | 3 |
5.643% | ANTE | 4 |
5.486% | OREO | 4 |
5.486% | ALEE | 4 |
5.329% | TREE | 4 |
5.329% | ESS | 3 |
5.329% | ELI | 3 |
5.329% | ACRE | 4 |
5.172% | TSAR | 4 |
5.172% | ANTI | 4 |
5.016% | ORAL | 4 |
The four letter words are more common now. Also look how much higher the percentages are. There’s less variety in the fill of Monday puzzles. “ALOE” and “ARIA” are classic crossword words, not to mention “OREO”.
Saturday:
Percentage | Word | Length |
---|---|---|
3.286% | ERA | 3 |
2.973% | ONE | 3 |
2.973% | ETE | 3 |
2.817% | TEN | 3 |
2.817% | EVE | 3 |
2.817% | ETA | 3 |
2.660% | IRE | 3 |
2.660% | ERR | 3 |
2.660% | ERE | 3 |
2.504% | OTIS | 4 |
2.504% | OLE | 3 |
2.504% | ENE | 3 |
2.504% | ELL | 3 |
2.504% | ELI | 3 |
2.504% | ARE | 3 |
2.504% | ARA | 3 |
2.504% | ALA | 3 |
2.504% | ACE | 3 |
2.347% | RTE | 3 |
2.347% | ICE | 3 |
2.347% | ATE | 3 |
2.347% | ALE | 3 |
2.191% | TSE | 3 |
2.191% | TERSE | 5 |
2.191% | SRI | 3 |
Lots of three letter words and much lower percentages. “OTIS” is surprising to me, but I don’t do many Saturday puzzles, so who am I to say?
It would be really interesting to combine this with some document frequency numbers for the English language. This would find words which are much more common in crosswords than they are in general, i.e. crosswordese.
I’d include everything necessary to reproduce this here, but the puzzles are not free. See this directory for the program I used to tabulate the statistics and complete word counts, both overall and for each day of the week. The first puzzle in my collection was 2006-10-23 and the last was 2009-01-19.
Mom said,
December 26, 2009 at 11:24 am
I enjoyed reading this Dan. Hope Rex Parker gets ahold of it!
Pam D'Angelo (Ben's mom) said,
January 7, 2010 at 1:41 pm
What fun to see the numbers behind what I do every day. Thanks for an enjoyable, enlightening post.