• Artificial Intelligence for CT and MRI Protocoling: A Meta-Analysis of Traditional Machine Learning, BERT, and Large Language Models

    Ethan Sacoransky, Yash Chauhan, Scott J Adams
    AJR Am J Roentgenol. 2025 Oct 29. doi: 10.2214/AJR.25.33759. Online ahead of print.

    Abstract

    Background: Examination protocoling is a resource-intensive task. Various artificial intelligence (AI) approaches have been investigated to automate this process. 

    Objective: To evaluate performance of traditional machine-learning (ML) models, bidirectional encoder representations from transformers (BERT) models, and large language models (LLMs) for automated CT and MRI protocoling. 

    Evidence Acquisition: MEDLINE, Embase, Scopus, Web of Science, IEEE Xplore, and Google Scholar databases were searched through July 2025 for studies reporting performance of an AI-based technique in assigning protocols for CT or MRI requisitions. Accuracy results were separately extracted for all models tested in each study and pooled using random-effects meta-analysis. AI approaches were compared using Welch t tests. Common sources of error were qualitatively summarized. 

    Evidence Synthesis: The final analysis included 23 studies, comprising 1,196,259 imaging requisitions. Requisition subspecialties included body imaging (n=4), musculoskeletal imaging (n=3), neuroradiology (n=6), thoracic imaging (n=1), and multiple subspecialties (n=9). Sixteen studies evaluated traditional ML models, eight evaluated BERT models, and five evaluated LLMs. Task-specific model fine-tuning was performed in three studies for traditional ML models, all studies for BERT models, and one study for LLMs. Overall pooled protocoling accuracy was 85% (95% CI: 83-87%). Pooled accuracy was 83% (95% CI: 80-85%) for traditional ML models, 87% (95% CI: 85-89%) for BERT models, and 86% (95% CI: 83-89%) for LLMs; these pooled accuracies were not significantly different between any pairwise combination of the three AI approaches (all p > .05). Among 30 distinct models (14 traditional ML models, nine BERT models, seven LLMs), the top ten performing models comprised two traditional ML models, six BERT models (including the top performing model [BioBert; accuracy, 93%]), and two LLMs. Common sources of error included ambiguous requisition text, data imbalance yielding incorrect protocol assignments for low-volume protocols, presence of multiple clinically reasonable protocols for given requisitions, and difficulty handling requisitions containing terms strongly associated with disparate protocols. 

    Conclusion: The top performing AI models for automated CT and MRI protocoling included predominantly finetuned BERT models. 

    Clinical Impact: AI tools show strong potential to help streamline radiologist workflows, possibly through hybrid AI-radiologist approaches. Fine-tuned LLMs warrant further exploration.