Another reason I like working at Google

Another great post from Peter Norvig makes me proud to be a googler. In this article (which he wrote most of the code on a plane ride) he shows in 20 lines how to write a spell corrector similar to the "Did you mean: ....?" spelling corrector in the Google search page. Just like his Sudoku solver earlier, his code is both clear and concise (without being overly cryptic). By the end of the article you feel like you would have been able to write the code yourself. One thing I like about his articles is I often learn something new about Python like the:
collections.defaultdict(lambda:1)
snippet in this article. Also, he goes into some of the theory, like the Bayesian theory in this case. His links were useful too, like the Spelling Error Corpus, and Google's trillion word N-gram corpus. Peter is the director of research so you might expect him to be smarter than the average bear, but there are an amazing number of very bright lights here at Google who I have the honor to work with daily.
If you are interesting in learning a little more Python and some very useful programming methodologies (Bayes' Theorem, probability, constraint propagation and search - for Sudoku) you really should read his articles.

Comments

Mark Papadakis said…
Indeed, a very interesting essay.
Based on the observations laid out there and hints available at [ http://googleresearch.blogspot.com/2006/08/all-our-n-gram-are-belong-to-you.html ] and [ http://people.csail.mit.edu/milch/papers/gvs.pdf ], one can build something that would partially match the efficiency of the Google spell checker, by tapping to the
n-grams collections released by Google. The PDF ( Searching the Web by Voice ) illustrates a methodology that seems to be a great fit for this problem domain as well.
Anonymous said…
Great essay - It was really interesting to read.

Popular posts from this blog

Seven Segment Display in Inkscape

Shortest Sudoku solver in Python