• Large Language Model-Based Automated Tumor, Node, Metastasis Staging and Resectability Assessment for Pancreatic Cancer in Radiology Reports With Detection of Incomplete Documentation

    Shota Fujisawa, Ryo Kurokawa, Akifumi Hagiwara, Mariko Kurokawa, Yusuke Asari, Sousuke Hatano, Daisuke Matsumoto, Ako Negishi, Wataru Gonoi, Osamu Abe

    Abstract

    Objectives: To assess the performance of the large language model (LLM) Claude 3.7 Sonnet in automatically extracting tumor, node, metastasis (TNM) classification and resectability status, as well as in identifying missing information required for these assessments, from free-text Japanese radiology reports of pancreatic cancer. 

     Methods: We conducted a retrospective study of 101 Japanese radiology reports from pancreatic cancer staging computed tomography (CT) examinations. Using a zero-shot approach, Claude 3.7 Sonnet was prompted with definitions from the 8th edition of the Japanese Pancreatic Cancer Classification to generate TNM stage and resectability categories. The model�s outputs were compared with a reference standard established by a radiologist�s interpretation of the same reports. Performance metrics included categorical accuracy and Cohen�s kappa coefficients. Detailed error analysis was also performed to characterize common sources of misclassification. 

     Results: Claude 3.7 Sonnet achieved accuracies of 84.1% for the T category, 92.1% for the N category, 98.0% for the M category, and 87.1% for resectability. Cohen�s kappa values indicated substantial agreement for T (κ = 0.745) and almost perfect agreement for N (κ = 0.858), M (κ = 0.956), and resectability (κ = 0.812). The lower accuracy in T classification was mainly attributable to misinterpretation of nuanced vascular involvement. The model effectively detected missing information for TNM classification but showed limitations in identifying omissions relevant to resectability assessment. 

     Conclusion: Claude 3.7 Sonnet demonstrated high accuracy in extracting structured pancreatic cancer staging information from unstructured Japanese radiology reports without task-specific training. While challenges remain in interpreting nuanced descriptions of vascular invasion and resectability, the model reliably identified most staging elements and omissions. These findings highlight the potential of LLMs as tools for semi-automated generation of structured data from routine free-text reports, which could improve reporting consistency, workflow efficiency, and secondary data utilization in oncology care.