DeepMind Created a Test to Measure an AI’s Ability to Reason
GENERAL INTELLIGENCE. AI has gotten pretty good at completing specific tasks, but it’s still a long way from having general intelligence, the kind of all around smarts that would let AI navigate the world the same way humans or even animals do.
One of the key elements of general intelligence is abstract reasoning — the ability to think beyond the “here and now” to see more nuanced patterns and relationships and to engage in complex thought. On Wednesday, researchers at DeepMind — a Google subsidiary focused on artificial intelligence — published a paper detailing their attempt to measure various AIs’ abstract reasoning capabilities, and to do so, they looked to the same tests we use to measure our own.
HUMAN IQ. In humans, we measure abstract reasoning using fairly straightforward visual IQ tests. One popular test, called Raven’s Progressive Matrices, features several rows of images with the final row missing its final image. It’s up to the test taker to choose the image that should come next based on the pattern of the completed rows.
The test doesn’t outright tell the test taker what to look for in the images — maybe the progression has to do with the number of objects within each image, their color, or their placement. It’s up to them to figure that out for themselves using their ability to reason abstractly.
To apply this test to AIs, the DeepMind researchers created a program that could generate unique matrix problems. Then, they trained various AI systems to solve these matrix problems.
Finally, they tested the systems. In some cases, they used test problems with the same abstract factors as the training set — like both training and testing the AI on problems that required it to consider the number of shapes in each image. In other cases, they used test problems incorporating different abstract factors than those in the training set. For example, they might train the AI on problems that required it to consider the number of shapes in each image, but then test it on ones that required it to consider the shapes’ positions to figure out the right answer.
BETTER LUCK NEXT TIME. The results of the test weren’t great. When the training problems and test problems focused on the same abstract factors, the systems fared OK, correctly answering the problems 75 percent of the time. However, the AIs performed very poorly if the testing set differed from the training set, even when the variance was minor (for example, training on matrices that featured dark-colored objects and testing on matrices that featured light-colored objects).
Ultimately, the team’s AI IQ test shows that even some of today’s most advanced AIs can’t figure out problems we haven’t trained them to solve. That means we’re probably still a long way from general AI. But at least we now have a straightforward way to monitor our progress.