by Shelly Fan at Singularity Hub: This week, a new study in Nature offers an unconventional idea: Using a second AI tool as a kind of “truth police” to detect when the primary chatbot is hallucinating. The tool, also a large language model, was able to catch inaccurate AI-generated answers. A third AI then evaluated the “truth police’s” efficacy.
The strategy is “fighting fire with fire,” Karin Verspoor, an AI researcher and dean of the School of Computing Technologies at RMIT University in Australia, who was not involved in the study, wrote in an accompanying article.
An AI’s Internal Word
Large language models are complex AI systems built on multilayer networks that loosely mimic the brain. To train a network for a given task—for example, to respond in text like a person—the model takes in massive amounts of data scraped from online sources—articles, books, Reddit and YouTube comments, and Instagram or TikTok captions.
This data helps the models “dial in” on how language works. They’re completely oblivious to “truth.” Their answers are based on statistical predictions of how words and sentences likely connect—and what is most likely to come next—from learned examples.
“By design, LLMs are not trained to produce truths, per se, but plausible strings of words,” study author Sebastian Farquhar, a computer scientist at the University of Oxford, told Science.