When AIs collaborate, accuracy in medicine improves
The growing use of Artificial intelligence (AI) in healthcare has raised questions about how trustworthy these systems really are. According to a new study published in the journal PLOS Medicine, researchers have found that AI systems work more accurately when they "think together" rather than alone.
The study tested a group, or "council", of five AI models based on GPT-4 on 325 publicly available questions from the three stages of the US Medical Licensing Examination (USMLE). These exams cover everything from basic medical science to clinical decision-making. When working together, the AI council answered questions correctly 97% of the time for Step 1, 93% for Step 2, and 94% for Step 3, outperforming a single AI model.
Normally, AI systems can give different answers to the same question, and some responses may be wrong or made-up. Instead of seeing this variability as a weakness, the researchers designed a system where the AIs discuss their answers, compare reasoning, and reconsider their choices when they disagree. A facilitator program guides this discussion and asks the group to try again until they reach a shared answer.
When the AIs initially disagreed, the discussion process led to the correct answer 83% of the time, and even fixed more than half of the mistakes that a simple majority vote would have missed.
The researchers believe this shows that collaboration helps AI self-correct, making it more reliable. While this approach has not yet been tested in real hospitals, it could one day lead to safer and more trustworthy AI tools for healthcare, education, and other fields where accuracy really matters.
Comments