A team of computer scientists at Google’s DeepMind project in the U.K., working with a colleague from the University of Wisconsin-Madison and another from Université de Lyon, has developed a computer program that combines a pretrained large language model (LLM) with an automated “evaluator” to produce solutions to problems in the form of computer code.
In their paper published in the journal Nature, the group describes their ideas, how they were implemented and the types of output produced by the new system.
Researchers throughout the scientific community have taken note of the things people are doing with LLMs, such as ChatGPT, and it has occurred to many of them that LLMs might be used to help speed up the process of scientific discovery. But they have also noted that for that to happen, a method is required to prevent confabulations, answers that seem reasonable but are wrong—they need output that is verifiable. To address this problem, the team working in the U.K. used what they call an automated evaluator to assess the answers given by an LLM.