Hector Levesque thinks his computer is stupid—and that yours is, too. Siri and Google’s voice searches may be able to understand canned sentences like “What movies are showing near me at seven o’clock?,” but what about questions—“Can an alligator run the hundred-metre hurdles?”—that nobody has heard before? Any ordinary adult can figure that one out. (No. Alligators can’t hurdle.) But if you type the question into Google, you get information about Florida Gators track and field. Other search engines, like Wolfram Alpha, can’t answer the question, either. Watson, the computer system that won “Jeopardy!,” likely wouldn’t do much better.
In a terrific paper just presented at the premier international conference on artificial intelligence, Levesque, a University of Toronto computer scientist who studies these questions, has taken just about everyone in the field of A.I. to task. He argues that his colleagues have forgotten about the “intelligence” part of artificial intelligence.
Levesque starts with a critique of Alan Turing’s famous “Turing test,” in which a human, through a question-and-answer session, tries to distinguish machines from people. You’d think that if a machine could pass the test, we could safely conclude that the machine was intelligent. But Levesque argues that the Turing test is almost meaningless, because it is far too easy to game. Every year, a number of machines compete in the challenge for real, seeking something called the Loebner Prize. But the winners aren’t genuinely intelligent; instead, they tend to be more like parlor tricks, and they’re almost inherently deceitful. If a person asks a machine “How tall are you?” and the machine wants to win the Turing test, it has no choice but to confabulate. It has turned out, in fact, that the winners tend to use bluster and misdirection far more than anything approximating true intelligence. One program worked by pretending to be paranoid; others have done well by tossing off one-liners that distract interlocutors. The fakery involved in most efforts at beating the Turing test is emblematic: the real mission of A.I. ought to be building intelligence, not building software that is specifically tuned toward fixing some sort of arbitrary test.
To try and get the field back on track, Levesque is encouraging artificial-intelligence researchers to consider a different test that is much harder to game, building on work he did with Leora Morgenstern and Ernest Davis (a collaborator of mine). Together, they have created a set of challenges called the Winograd Schemas, named for Terry Winograd, a pioneering artificial-intelligence researcher at Stanford. In the early nineteen-seventies, Winograd asked what it would take to build a machine that could answer a question like this:
The town councillors refused to give the angry demonstrators a permit because they feared violence. Who feared violence?
a) The town councillors
b) The angry demonstrators
Levesque, Davis, and Morgenstern have developed a set of similar problems, designed to be easy for an intelligent person but hard for a machine merely running Google searches. Some are more or less Google-proof simply because they are about made-up people, who, by definition, have few Google hits:
Joan made sure to thank Susan for all the help she had given. Who had given the help?
(To make things harder to game, an alternative formulation substitutes “received” for “given.”)
One can’t simply count the number of Web pages in which people named Joan or Susan gave other people help. Instead, answering this question demands a fairly deep understanding of the subtleties of human language and the nature of social interaction.
Others are Google-proof for the reason that the alligator question is: alligators are real, but the particular fact in question isn’t one that people usually comment on. For example:
The large ball crashed right through the table because it was made of Styrofoam. What was made of Styrofoam? (The alternative formulation replaces Stryrofoam with steel.)
a) The large ball
b) The table
Sam tried to paint a picture of shepherds with sheep, but they ended up looking more like golfers. What looked like golfers?
a) The shepherds
b) The sheep
These examples, which hinge on the linguistic phenomenon known as anaphora, are hard both because they require common sense—which still eludes machines—and because they get at things people don’t bother to mention on Web pages, and that don’t end up in giant data sets.
More broadly, they are instances of what I like to call the Long-Tail Problem: common questions can often be answered simply by trawling the Web, but rare questions can still stymie all the resources of a whole Web full of Big Data. Most A.I. programs are in trouble if what they’re looking for is not spelled out explicitly on a Web page. This is part of the reason for Watson’s most famous gaffe—mistaking Toronto for a city in the United States.
The same problem comes up in image search, in two ways: many kinds of pictures are rare, and many kinds of labels are rare. There are millions of pictures of cats labelled “cat”; but a Google Image search for “scuba diver with a chocolate cigarette” yields almost nothing of relevance (dozens of pictures of cigars, pin-up girls, beaches, and chocolate cakes)—even though any human could readily summon a mental image of an appropriately adorned diver. Or take the phrase “right-handed man.” The Web is filled with pictures of right-handed men engaged in unmistakeably right-handed actions (like throwing a baseball), which any human working in a photo archive could rapidly sort out. But very few of those pictures are labeled as such. A search for “right-handed-man” instead returns a grab bag of sports stars, guitars, golf clubs, key chains, and coffee mugs. Some are relevant, but most are not.
Levesque saves his most damning criticism for the end of his paper. It’s not just that contemporary A.I. hasn’t solved these kinds of problems yet; it’s that contemporary A.I. has largely forgotten about them. In Levesque’s view, the field of artificial intelligence has fallen into a trap of “serial silver bulletism,” always looking to the next big thing, whether it’s expert systems or Big Data, but never painstakingly analyzing all of the subtle and deep knowledge that ordinary human beings possess. That’s a gargantuan task— “more like scaling a mountain than shoveling a driveway,” as Levesque writes. But it’s what the field needs to do.
In short, Levesque has called on his colleagues to stop bluffing. As he puts it, “There is a lot to be gained by recognizing more fully what our own research does not address, and being willing to admit that other … approaches may be needed.” Or, to put it another way, trying to rival human intelligence, without thinking about all the intricacies of the human mind at its best, is like asking an alligator to run the hundred-metre hurdles.