Wednesday, September 16, 2009

Digitizing Books One Word at a Time

UPDATE: Google buys ReCaptcha | official one

Original Blog
Continuing my curiosity with CAPTCHA's, I came across this amazing initiative here.

This project involves digitizing books using solved CAPTCHA's. Over 200 million CAPTCHA's (words) are decoded by real people everyday. So this novel idea involves using this effort to convert books into digital books. In the past we have used OCR (Optical reader) software. But OCR's are not accurate enough. Combining OCR + human intelligence with CAPTCHA's is a killer idea!

Similar efforts were implemented by Google to improve its image search results called 'Google Image Labeler'. But reCAPTCH project is taking social human intelligence to the next level.

What can be other ideas of using human effort to make meaningful contributions? Few ideas I can think of are - music tagging, video tagging, using spelling mistakes as a feedback to spell checkers.... What else? Any suggestions?

Related Blog: CAPTCHAS that make me feel illiterate

Wednesday, September 9, 2009

Kai Fu Lee and doing web business in China

Last week Google China VP Dr. Kai Fu Lee left Google to start his own venture capital firm to back China based high tech innovation. Last year as a student at Carnegie Mellon University, I had a chance to listen to his interesting lecture on Google's China strategy. I highly recommend watching his full lecture to anyone who is interested in understanding China's technology landscape, and how it is different. He shared some very interesting observations like -

In China - people do not have access to Wikipedia!
It is belied that to be successful in China one needs to take long term view, Google fully understands this -
"We will take long term view in China, China has 5000 years of history and Google has 5000 years of patience in China" - Google CEO, Eric Schmidt



Finally, the way Chinese users view a search page is far different than the way americans do - here are the results of an eye tracking study of a Goole search page!

Here is the full lecture -