Application of 10-fold cross-validation in the evaluation of generalization ability of prediction models and the realization in R
Liang Zichao1, Li Zhiwei1, Lai Keng2, Lin Zhuochen1, Li Tiegang2, Zhang Jinxin1.
1.Department of Medical Statistics, School of Public Health, Sun Yat-sen University, Guangzhou 510080, China;
2.Prevention Institute for Tuberculosis of Guangzhou, Guangzhou 510095, China
Abstract:Objective To introduce the basic principle of 10-fold cross-validation, and use R to show the application of 10-fold cross-validation in the evaluation of the generalization ability of prediction models in an example. Methods The logistic regression model was used with 10-fold cross-validation as the evaluation method to predict the treatment outcome of drug-resistant tuberculosis patients. At the same time, the 5-fold, 10-fold, and leave-one-out cross-validation were used to compare the parameters of evaluation. Results Different divisions of training and test sets have influence on the parameters of evaluation. Compared with other cross-validation methods, the parameters of evaluation from 10-fold cross-validation have a higher stability and efficiency than other numbers of fold. Conclusion In the evaluation of the application effects of machine learning models, the comprehensive performance of 10-fold cross-validation is prominent and the generalization ability of different models can be objectively measured.
梁子超,李智炜,赖铿,林卓琛,李铁钢,张晋昕. 10折交叉验证用于预测模型泛化能力评价及其R软件实现[J]. 中国医院统计, 2020, 27(4): 289-292.
Liang Zichao, Li Zhiwei, Lai Keng, Lin Zhuochen, Li Tiegang, Zhang Jinxin.. Application of 10-fold cross-validation in the evaluation of generalization ability of prediction models and the realization in R. journal1, 2020, 27(4): 289-292.
[1]YADAV S, SHUKLA S. Analysis of kfold crossvalidation over holdout validation on colossal datasets for quality classification[C/OL]//2016 IEEE 6th International Conference on Advanced Computing (IACC). Bhimavaram, India: IEEE, 2016:7883[20200518].https://ieeexplore.ieee.org/document/7544814.
[2]RODRGUEZ J D, PREZ A, LOZANO J A. Sensitivity analysis of kappafold cross validation in prediction error estimation[J]. IEEE Trans Pattern Anal Mach Intell, 2010, 32(3):569-575.
[3]刘发明,江桂华,杨宁,等.新型冠状病毒肺炎的影像组学研究[J].中国医学物理学杂志,2020,37(4):463-467.
[4]CAO T A, WANG Q S, LIU D, et al. Resting state EEGbased sudden pain recognition method and experimental study[J]. Biomedical Signal Processing and Control, 2020, 59:101925.
[5]MAO Y, DONG L X, ZHENG Y, et al. Prediction of recurrence in cervical cancer using a nine-lncRNA signature[J]. Front Genet, 2019, 10:284.
[6]WANG Q, WANG X M, CHEN W M, et al. Application of generalized estimation equations to establish prediction equation for tuberculosis drug resistance in Zhejiang Province[J]. Chinese Journal of Epidemiology, 2018, 39(3):368-373.
[7]胡海娟.耐药结核病的危险因素分析及获得性耐药结核病发病风险预测模型的建立[D].镇江:江苏大学,2016.
[8]KANG M W, KIM H K, CHOI Y S, et al. Surgical treatment for multidrugresistant and extensive drugresistant tuberculosis[J]. Ann Thorac Surg, 2010, 89(5):1597-1602.