Loading…

Prediction Intervals of Response Variables based on Quantiles in High Dimensional Regression Analyses

In regression analyses it is of interest to obtain prediction intervals of the response variables. However, such prediction intervals are not obvious if the number of explanatory variables exceeds the number of observations since the least square method cannot be used in this case. This paper discus...

Full description

Saved in:
Bibliographic Details
Published in:IOP conference series. Earth and environmental science 2018-11, Vol.187 (1), p.12045
Main Authors: Rahardiantoro, Septian, Notodiputro, Khairil Anwar, Kurnia, Anang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In regression analyses it is of interest to obtain prediction intervals of the response variables. However, such prediction intervals are not obvious if the number of explanatory variables exceeds the number of observations since the least square method cannot be used in this case. This paper discusses the problems of constructing prediction intervals in high dimensional regression models, in which the number of explanatory variables is greater than the number of observations. A quantile approach is proposed to construct such intervals and it has been evaluated by means of simulation. In this approach pairs of quantiles based on the certain probability are specified and followed by evaluation to obtain the shortest interval. Since the number of explanatory variables was large then several techniques to select the variables were employed. These techniques were the best subset regression, LASSO (Least Absolute Shrinkage and Selection Operator) regression, and model averaging. The simulation data was generated according to two different scenarios. The first scenario was designed for models having symmetric error distributions whereas the second scenario was designed for models with non-symmetric error distributions. The simulation results showed that in the case of symmetric error distributions all of the regression methods mentioned above produced similar prediction intervals, except the LASSO regression. However, in the case of non-symmetric error distributions it has been evidence that model averaging has provided the best prediction intervals when compared with the best subset and LASSO regressions although has wide range of intervals. This revealed that model averaging can be used to predict the response variables in high-dimensional regression analyses although the data is non-symmetrically distributed.
ISSN:1755-1307
1755-1315
1755-1315
DOI:10.1088/1755-1315/187/1/012045