James Shaw
2011-Sep-09 15:51 UTC
[R] Problems using quantile regression (rq) to model GLD random variables in R
Everyone: I am working on a simulation of the efficiencies of regression estimators when applied to model a specific form of highly skewed data. The outcome variable (y) is being simulated from a generalized lambda distribution (GLD) to reflect the characteristics (mean, variance, skewness, kurtosis) of an observed variable. The regressor of interest (x) is simply a binary indicator of group membership. The relevant R code is as follows: x[1:200]<-0 x[201:400]<-1 params <- c(2.684864244,.0144182,26.01913,711.0554) y0<-rgl(200, params, param="rs") params <- c(-0.113741088,.0523072,15.98282,426.4456) y1<-rgl(200, params, param="rs") y<-c(y0, y1) I have verified that the GLD parameters in each case are valid using gl.check.lambda (in the GLD package). While I experienced no difficulty when using OLS to fit models to y, the quantile regression estimator and robust (e.g., M) regression estimators yielded minute (or missing) variance estimates and infinitely large t statistics for the coefficient for x. The problem appears to be related to the number of duplicate observations in my simulated data. As I understand it, the GLD is a transformation of the uniform distribution. Given the parameters specified above, I end up with many duplicate observations that happen to be equal to the true median value. This lack of variation around the median appears to be causing problems for the quantile regression estimator (as implemented using rq) and robust regression estimator. I am unaware of a viable alternative to the GLD that can be readily implemented in R. In the absence of an alternative distribution, I am wondering whether jittering could be used as a practical (and hopefully valid) solution to my dilemma. That is, add a small residual drawn from U(-.5,.5) to each GLD observation and model the composite variable as a function of x. This would be expected to preserve the mean and median of y over repeated simulations, and the added variance would be expected to be negligible. When using this procedure, I derive reasonable variance estimates and get results that make intuitive sense (i.e., the efficiency of the M estimator >quantile regression > OLS). I have seen a similar jittering procedure applied in a paper on the modeling of quantiles of count data (Machado and Santa Silva. JASA. 2005; 100: 1226). I would appreciate others' thoughts regarding the validity of the proposed jittering procedure or suggestions for alternative approaches I could use to deal with my problem. Many thanks! Regards, Jim P.S.: Although I do not think it has any bearing on my problem, here is the quantile regression code I am using: fit1<-summary(rq(y~x, tau = .5, ci=FALSE),se="ker")
Seemingly Similar Threads
- Quantile regression (rq) and complex samples
- quantile regression: plotting coefficients on only one variable (rq)
- penalized quantile regression (rq.fit.lasso)
- Quantile Regression for Longitudinal Data. Warning message: In rq.fit.sfn
- Quantile regression: Discrepencies Between optimizer and rq()