Analogical reasoning, a cognitive function widely attributed to humans, has been extensively researched. However, a groundbreaking study by UCLA psychologists challenges this notion.
The study reveals that GPT-3, an AI language model developed by OpenAI, exhibits reasoning capabilities almost on par with college undergraduates. GPT-3’s performance in solving problems similar to intelligence tests and standardized exams like the SAT is particularly notable. Published in the journal Nature Human Behaviour, these findings prompt us to question whether GPT-3’s analogical reasoning stems from its extensive language training or an entirely novel cognitive process.
The exact mechanism behind GPT-3’s analogical reasoning remains undisclosed by OpenAI, intriguing the researchers at UCLA. While GPT-3 performs impressively on certain reasoning tasks, it is not without limitations. Despite excelling in analogical reasoning, GPT-3 struggles with tasks that humans consider trivial, such as utilizing tools for physical tasks. By designing problems inspired by Raven’s Progressive Matrices, the team at UCLA tested GPT-3’s capabilities. Converting image-based challenges into a text format that GPT-3 could interpret ensured that these were entirely new problems for the AI.
Comparing the performance of GPT-3 to that of 40 UCLA undergraduates, the results were remarkable. Not only did GPT-3 match human performance, but it also made similar mistakes. With an accuracy rate of 80%, GPT-3 outperformed the average human score while falling within the range of top human performers. The team at UCLA further investigated GPT-3’s abilities using unpublished SAT analogy questions, where the AI surpassed the human average. However, GPT-3 struggled slightly when drawing analogies from short stories, although the newer GPT-4 model showed improvement in this area.
The researchers at UCLA are not content with mere comparisons; they are striving to develop a computer model inspired by human cognition. Continuously assessing this model against commercial AI models allows them to identify areas where GPT-3 lags. For example, GPT-3 exhibits weaknesses in tasks that involve comprehending physical space or require tool usage. To gain deeper insights into AI’s cognitive processes, understanding the backend of AI models becomes crucial. This leap has the potential to significantly shape the future trajectory of AI.
Expressing the need for access to GPT models’ backend, Webb concludes that it would greatly benefit both AI and cognitive researchers. Currently, researchers are limited to analyzing inputs and outputs, lacking the depth required for conclusive studies and advancements.