Google on Wednesday acquired reCaptcha, a company that provides security questions that help prevent website fraud and spam on more than 100,000 websites, and also assists with the digitisation of books and newspapers. Terms of the deal were not released.
The acquisition serves two purposes for Google – bolstering its line of security products, and also providing a new asset in its effort to digitise books and newspapers.
Captchas work by distorting a series of letters so that they are still legible by humans, but cannot be easily deciphered by automated malicious programmes.
“The words in many of the captchas provided by reCaptcha come from scanned archival newspapers and old books,” said Luis von Ahn, co-founder of reCaptcha, and Will Cathcart, Google product manager, in a blog post.
“Computers find it hard to recognise these words because the ink and paper have degraded over time, but by typing them in as a captcha, crowds teach computers to read the scanned text.”
Bringing this technology in-house will help Google with its book-scanning efforts, an increasingly ambitious project that is bringing it into conflict with European regulators.
“It’s basically a crowd-sourcing technology masked as a security tool,” said Adam O’Donnell, director of emerging technologies at Cloudmark, a security firm based in San Francisco.
Mr O’Donnell said Google could use the technology for several crowd-sourcing efforts.
“The most obvious one is to extend their digitisation of their books,” he said. “Another could be to classify images picked up via Google Street View.”
There are several providers of captcha technology, but reCaptcha has emerged as the most popular. Captcha makes money by charging companies such as the New York Times to identify words that its digitisation tools have had a hard time identifying.
Google and reCaptcha had not previously been working together, and reCaptcha’s broad reach could accelerate Google’s efforts.
ReCaptcha was spun-off from Carnegie Mellon University, where Mr von Ahn is an assistant professor of computer science.
“At first each of the big firms grew their own technology because it wasn’t that hard,” said Mr O’Donnell.
“But reCaptcha is the go to technology for many small firms.”
Google has its own captcha product, and earlier this year was reported to be working on its own new version. Google said it was too early to discuss plans for integration.
Captchas rely on optical character recognition technology, the same technology used for Google’s book digitisation projects. Mr O’Donnell said Google already has a skilled team of computer vision engineers.
With this team he said Google could improve reCaptcha by using new fonts and difficult words from its growing archive of scanned documents. “What a captcha does is try to figure out the problems that are hard to do using existing computer vision technologies,” he said. “I think they’ll be able to grow a better captcha as a result.”