AI Can Now Pass School Tests but Still Falls Short on the Turing Test

AI Can Now Pass School Tests but Still Falls Short on the Turing Test

From winning at Go to passing eighth grade level multiple choice tests, AI is making rapid advances. But its creativity still leaves much to be desired.

On September 4, 2019, Peter Clark,  along with several other researchers, published “From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project∗” The Aristo project named in the title is hailed for the rapid improvement it has demonstrated when it tested the way eighth-grade human students in New York State are tested for their knowledge of science. 

The researchers concluded that this is an important milestone for AI: “Although Aristo only answers multiple choice questions without diagrams, and operates only in the domain of science, it nevertheless represents an important milestone towards systems that can read and understand. The momentum on this task has been remarkable, with accuracy moving from roughly 60% to over 90% in just three years.”

The Aristo project is powered by the financial resources and vision of Paul G. Allen, the Founder of the Allen Institute for Artificial Intelligence (A12). As the site explains, there are several parts to making AI capable of passing a multiple-choice test.

Aristo’s most recent solvers include:

The Information Retrieval, PMI, and ACME solvers that look for answers in a large corpus using statistical word correlations. These solvers are effective for “lookup” questions where an answer is explicit in text.
The Tuple Inference, Multee, and Qualitative Reasoning solvers that attempt to answer questions by reasoning, where two or more pieces of evidence need to be combined to derive an answer.
The AristoBERT and AristoRoBERTa solvers that apply the recent BERT-based language-models to science questions. These systems are trained to apply relevant background knowledge to the question, and use a small training curriculum to improve their performance. Their high performance reflects the rapid progress made by the NLP field as a whole.
While Aristo’s progress is, indeed, impressive, and, no doubt, there are some eight graders who wish they could find some way to carry along the AI with them to the test, it still is far from capable of passing a Turing test. In fact, the Allen Institute for Artificial Intelligence admitted that it was deliberately testing its AI in a different way when it set out to develop it in 2016.

The explanation was given in an article entitled, “Moving Beyond the Turing Test with the Allen AI Science Challenge. Admitting that the test would not be “a full test of machine intelligence,” it still considered worthwhile for its showing “several capabilities strongly associated with intelligence – capabilities that our machines need if they are to reliably perform the smart activities we desire of them in the future – including language understanding, reasoning, and use of commonsense knowledge.”

There’s also the practical consideration that makes testing with ready-made tests so appealing: “In addition, from a practical point of view, exams are accessible, measurable, understandable, and compelling.” Come to think of it, that’s why some educators love having standardized tests, while others decry them for the very fact that they give the false impression they are measuring intelligence when all they can measure is performance of a very specific nature.

When it comes to more creative intelligence in which the answer is not simply out there to be found or even intuited, AI still has quite a way to go. We can see that in its attempts to create a script.

Making movies with AI
Benjamin (formerly known as Jetson) is the self-chosen name of “the world’s first automated screenwriter.” The screenwriter known as Benjamin is “a self-improving LSTM RNN [Long short-term memory recurrent neural network] machine intelligence trained on human screenplays.

Benjamin has his/its own Facebook page, Benjamin also used to have a site under that name, but now he/it shares the credit on a more generally named one,, which offers links to all three of the films based on scripts generated by AI that were made within just two days to qualify for the Sci-Fi London’s 48hr Film Challenge.

Benjamin’s first foray into film was the script for “Sunspring.” However, even that required a bit of prompting from Ross Goodwin, “creative technologist, artist, hacker, data scientist,” as well as the work of the filmmaker Oscar Sharp, and three human actors.

The film was posted to YouTube, and you can see it in its entirety by sitting through the entire 9 minutes. See if you share the assessment expressed by the writer Neil Gaiman whose tweet appears on the Benjamin site: “Watch a short SF film gloriously fail the Turing Test.”

Read more:

Related Post:

Share on facebook
Share on twitter
Share on linkedin
Share on whatsapp


Play Video
Manahel Thabet Ph.D. – President participated in the first Economic Leadership Workshop
Play Video