• Data Extraction and Curation from Radiology Reports for Pancreatic Cyst Surveillance Using Large Language Models

    Ankur P Choubey, Emanuel Eguia, Alexander Hollingsworth, Subrata Chatterjee, Michael I D'Angelica, William R Jarnagin, Alice C Wei, Mark A Schattner, Richard K G Do, Kevin C Soares; MSKCC Pancreas Cyst Collaborative
    J Am Coll Surg. 2025 Jul 10:10.1097/XCS.0000000000001478. doi: 10.1097/XCS.0000000000001478. Online ahead of print.

    Abstract

    Introduction: Manual curation of radiographic features in pancreatic cyst registries for data abstraction and longitudinal evaluation is time consuming and limits widespread implementation. We examined the feasibility and accuracy of using large language models (LLMs) to extract clinical variables from radiology reports. 

     Methods: A single center retrospective study included patients under surveillance for pancreatic cysts. Nine radiographic elements used to monitor cyst progression were included: cyst size, main pancreatic duct (MPD) size (continuous variable), number of lesions, MPD dilation ≥5mm (categorical), branch duct dilation, presence of solid component, calcific lesion, pancreatic atrophy, and pancreatitis. LLMs (GPT) on the OpenAI GPT-4 platform were employed to extract elements of interest with a zero-shot learning approach using prompting to facilitate annotation without any training data. A manually annotated institutional cyst database was used as the ground truth (GT) for comparison. 

     Results: Overall, 3198 longitudinal scans from 991 patients were included. GPT successfully extracted the selected radiographic elements with high accuracy. Among categorical variables, accuracy ranged from 97% for solid component to 99% for calcific lesions. In the continuous variables, accuracy varied from 92% for cyst size to 97% for MPD size. However, Cohen's Kappa was higher for cyst size (0.92) compared to MPD size (0.82). Lowest accuracy (81%) was noted in the multi-class variable for number of cysts. 

     Conclusion: LLM can accurately extract and curate data from radiology reports for pancreatic cyst surveillance and can be reliably used to assemble longitudinal databases. Future application of this work may potentiate the development of artificial intelligence-based surveillance models.