J Imaging Inform Med. 2025 Aug 15. doi: 10.1007/s10278-025-01637-w. Online ahead of print.
Abstract
This study aimed to examine the performance of a fine-tuned large language model (LLM) in extracting pretreatment pancreatic cancer according to computed tomography (CT) radiology reports and to compare it with that of readers. This retrospective study included 2690, 886, and 378 CT reports for the training, validation, and test datasets, respectively. Clinical indication, image finding, and imaging diagnosis sections of the radiology report (used as input data) were reviewed and categorized into groups 0 (no pancreatic cancer), 1 (after treatment for pancreatic cancer), and 2 (pretreatment pancreatic cancer present) (used as reference data). A pre-trained Bidirectional Encoder Representation from the Transformers Japanese model was fine-tuned with the training and validation dataset. Group 1 data were undersampled and group 2 data were oversampled in the training dataset due to group imbalance. The best-performing model from the validation set was subsequently assessed using the test dataset for testing purposes. Additionally, three readers (readers 1, 2, and 3) were involved in classifying reports within the test dataset. The fine-tuned LLM and readers 1, 2, and 3 demonstrated an overall accuracy of 0.942, 0.984, 0.979, and 0.947; sensitivity for differentiating groups 0/1/2 of 0.944/0.960/0.921, 0.976/1.000/0.976, 0.984/0.984/0.968, and 1.000/1.000/0.841; and total time required for classification of 49 s, 2689 s, 3496 s, and 4887 s, respectively. Fine-tuned LLM effectively extracted patients with pretreatment pancreatic cancer according to CT radiology reports, and its performance was comparable to that of readers in a shorter time.