Prediction of Human Papillomavirus (HPV) Status in Oropharyngeal Squamous Cell Carcinoma Based on Radiomics and Machine Learning Algorithms: A Multi-Cohort Study
Background: Human Papillomavirus status has significant implications for prognostic evaluation and clinical decision-making for Oropharyngeal Squamous Cell Carcinoma patients. As a novel method, radiomics provides a possibility for non-invasive diagnosis. The aim of this study was to examine whether Computed Tomography (CT) radiomics and machine learning classifiers can effectively predict Human Papillomavirus types and be validated in external data in patients with Oropharyngeal Squamous Cell Carcinoma based on imaging data from multi-institutional and multi-national cohorts.
Materials and methods: 651 patients from three multi-institutional and multi-national cohorts are collected in this retrospective study: OPC-Radiomics cohort (n=497), MAASTRO cohort (n=74), and SNPH cohort (n=80). OPC-Radiomics cohort was randomized into training cohort and validation cohort with a ratio of 2:1. MAASTRO cohort and SNPH cohort were used as independent external testing cohorts. 1316 quantitative features were extracted from the Computed Tomography images of primary tumors. After feature selection by using Logistic Regression and Recursive Feature Elimination algorithms, 10 different machine- learning classifiers were trained and compared in different cohorts.
Results: By comparing 10 kinds of machine-learning classifiers, we found that the best performance was achieved when using a Random Forest-based model, with the Area Under the Receiver Operating Characteristic (ROC) Curves(AUCs) of 0.97, 0.72, 0.63, and 0.78 in the training cohort, validation cohort, testing cohort 1 (MAASTRO cohort), and testing cohort 2 (SNPH cohort), respectively.
Conclusion: The Random Forest-based radiomics model was effective in differentiating Human Papillomavirus status of Oropharyngeal Squamous Cell Carcinoma in multi-national population, which provides the possibility for this non-invasive method to be widely applied in clinical practice.