|
|
Simulation study on linear regression analysis strategy under extreme value conditions |
Hu Weiwei, Li Yemian, Yan Hong, Chen Fangyao |
Department of Epidemiology and Health Statistics, School of Public Health, Xi′an Jiaotong University, Xi′an 710061, China |
|
|
Abstract Objective To compare the actual effects of four commonly used extreme value processing strategies in multi-factor regression analysis under different extreme value conditions, and to provide a reference for formulating extreme value processing strategies in multi-factor regression analysis.Methods Monte Carlo algorithm was used to simulate the extreme data under different conditions, and the extreme value processing strategy of multivariate regression analysis under different extreme conditions was simulated. Type I error probability-α, type II error probability-β, root mean square error of model coefficient estimation, model R2 and adjustment R2 were used as evaluation indexes.Results The method of directly deleting extreme values performs well when the proportion of extreme values and the number of observation points containing extreme values are small, but with the increasing proportion and number of extreme values, the performance becomes worse and worse. Robust regression analysis has a very good performance under various extreme conditions of large sample size, except that the performance is not very good when the sample size is small. The method of converting extreme values into missing values and then performing multiple interpolation only has acceptable performance when the sample size is small and the proportion of extreme values is low. The processing method of data conversion is very poor in all extreme conditions.Conclusion Robust regression analysis is best used when the sample size is large, but it should be used with caution when the sample size is small. The method of converting extreme values into missing values and then multiple interpolation is suitable for data sets with small proportion of extreme values. The method of directly deleting extreme values is only suitable for data sets with small number of observation points and small proportion, while the method of data conversion is not suitable for most extreme values.
|
Received: 06 May 2022
|
|
|
|
[1]AGGARWAL C C. An introduction to outlier analysis[M]// Outlier Analysis. New York:Springer. 2017: 1-34. DOI:10.1007/978-3-319-47578-3_1.
[2]原少斌.回归分析中异常值诊断方法的比较研究[D].兰州:兰州商学院,2014.
[3]王寅琮.回归分析中异常值与共线性的诊断[D].秦皇岛:燕山大学,2012.
[4]邹乐强.最小二乘法原理及其简单应用[J].科技信息,2010(23):282-283.DOI:10.3969/j.issn.1001-9960.2010.23.875.
[5]崔俊富,陈金伟,崔伟.残差在线性回归分析中的作用研究[J].牡丹江大学学报,2020,29(10):84-88.DOI:10.15907/j.cnki.23-1450.2020.10.018.
[6]宋亚男.线性回归模型中的异常值检测与稳健性估计[D].兰州:兰州大学,2020.
[7]彭珊.线性回归模型中关于异常点的若干问题的分析[D].哈尔滨:东北林业大学,2014.
[8]崔乐.基于多元线性回归的稳健估计和异常值检测的研究[D].长春:长春理工大学,2019.
[9]胡良平.稳健回归分析[J].四川精神卫生,2018,31(3):201-204.DOI:10.11886/j.issn.1007-3256.2018.03.003.
[10]YU C, YAO W. Robust linear regression: A review and comparison [J]. Communications in StatisticsSimulation and Computation, 2017, 46(8): 6261-6282. DOI:10.1080/03610918.2016.1202271.
[11]王海娜.线性回归模型的若干稳健估计方法及应用实例[D].济南:山东大学,2013.
[12]丁明珠.正态模型缺失数据的贝叶斯和Jackknife多重插补法的比较[J].计算技术与自动化,2020,39(2):119-123.DOI:10.16339/j.cnki.jsjsyzdh.202002024.
[13]王江荣,王春媛,刘建清,等.Box-Cox变换在我国农业总产值预测分析中的应用[J].数学的实践与认识,2021,51(18):185-194.
[14]BOX G E P, COX D R. An analysis of transformations [J]. Journal of the Royal Statistical Society: Series B (Methodological), 1964, 26(2): 211-243. DOI:10.1111/j.2517-6161.1964.tb00553.x. |
[1] |
Zhou Chaohua, Xia Xiaoqiong, Wu Xiaoyun, Peng Cheng, Ye Xiufeng. Study on the prediction of health human resources in Shenzhen based on the grey regression coupling model[J]. journal1, 2021, 28(3): 269-273. |
[2] |
Cao Yuanyuan. Study on the application of linear regression-grey model(1,1) coupling model in the prediction of regional health human resource[J]. journal1, 2019, 26(2): 130-132. |
[3] |
Ning Hui, Wang Chao, Wang Yun, Liu Meina. Analysis of influencing factors on treatment quality of acute myocardial infarction based on empirical logit transformation[J]. journal1, 2018, 25(6): 401-404. |
[4] |
. [J]. journal1, 2017, 24(6): 457-460. |
[5] |
Li Ruibo, Zhang Qiao, Xiang Jing. Analysis of the admission process satisfaction degree of patients in five 3A hospitals[J]. journal1, 2016, 23(5): 340-341. |
[6] |
. [J]. journal1, 2016, 23(2): 148-149. |
|
|
|
|