The idea that a machine could some day match a human being at the highest level of cognitive tasks–processing language–has tantalized computer scientists since the days of Alan Turing. Cpedia, a new project from the folks who brought us the ridiculous search engine Cuil, shows how far machine language has to go.
Cpedia claims to be an automated encyclopedia that assembles its entries by gathering relevant snippets from sites all over the Web. (For a full review, see this Technologizer post by Harry McCracken.) What it actually produces is mostly gibberish that barely qualifies as language, let alone coherent thought.
As a human being and a writer by trade, I find this immensely gratifying. We are brilliant at understanding language, but we lack the meta-understanding that would let us convert language into a set of rules or algorithms that would make human-quality machine language possible. The commercial state of the art in natural language understanding today is Google Translate, which can take arbitrary text into something mostly comprehensible, while leaving no doubt that that the translation is either the work of a machine or someone considerably less than fluent in the target language. Google Translate chokes on the chestnut “Time flies like an arrow, Fruit flies like a banana.” Rendering it from English into German or French, it misinterprets the second “like” as a preposition rather than a verb. Even when presented with the less contrived “Fruit flies like bananas,” it cannot correctly parse the meaning of “like.”
This stuff is really hard; natural language understanding remains one of the great challenges of computer science. Google, at least, has been honest in its claims about what machine translation can and cannot do. I just wish others, such as the promoters of Cpedia, would be a bit more modest in their ambitions.