nooldor
2014-Jan-30 21:24 UTC
[R] Universal regression program - multiple regressions on set of data
Hi, Here is description of my problem: I have data frame that contains 110 variables (columns) and 595 observations each. Some of the variables will be my Y-dependent variable, some will be Q, X, Z - independent variables I need to estimate robust regression models: Y~X+Z+Q I want to create 4 subsets from this data frame: (first column contains dates - and should be skipped) Y <- columns 2:31 [30 variables/columns] (let's call them y1, y2, y3 ... y30 - each of them has some specific different, unordered name, here i just call them y1, y2 and so on, *I think of them as of vectors, e.g. y1- is vector of variable called - in this example - "y1" with 595 observations*- some of them can be NA) X<- columns 32:50 [19 variables/columns] (let's call them x1,x2,x3 ... x19 - each of them has some specific different, unordered name, here i just call them y1, yw2 and so on , I think of them as of vectors, e.g. x1- is vector of variable called - in this example - "x1" with 154 observations some of them can be NA) Z<- columns 51:80 [30 variables/columns] let's call them z1, z2, z3 ... z30 - analogously as X Q<- columns 81:110 [30 variables/columns] let's call them q1, q2, q3 ... q30 - analogously as X y1 is corresponding to z1 and q1 (and so on) in regressions below: The goal: I want to write code that will generate 30 x 19 robust regressions (package MASS: rlm), like that: y*1*~x1+z*1*+q*1* y*1*~x2+z*1*+q*1* ... y*1*~x19+z*1*+q*1* y*2*~x1+z*2*+q*2* y*2*~x2+z*2*+q*2* ... y*2*~x19+z*2*+q*2* y*30*~x1+z*30*+q*30* y*30*~x2+z*30*+q*30* ... y*30*~x19+z*30*+q*30* (as previously described y2 - means second vector in Y subset, x19 - means 19th vector in X subset, z2- 2nd vector in Z subset ... and so on) so first vector of Y subset should be regressed on first vector of Z subset and first vector of subset Q but with "changing" vector of X subsets ... and so on for all 30 vectors in Y subset - during running each of those rlm regressions, program should extract residuals of each regression and check if ArchTest() (package: finTS) [ArchTest(resid,*lags=5*, *demean = FALSE*)] p-value of this test is lower then 0,05 if yes then it should estimate Garch (1,1) model described in here: http://stats.stackexchange.com/questions/45482/how-to-estimate-garch-in-r-exogenous-variables-in-mean-equation then the program should check again (the same) equation with ArchTest and if p-value is again lower then 0,05 it should apply Garch (1,2) model and so on (applying garch(1,3), garch(1,4) and so on) till p-value from ArchTest will be grater then 0,05, if p-val form ArchTest will be finally grater then 0,05 program should go to next equation and repeat procedure. In the end I would like to have one data frame as result that contains coefficients of all of 30 x 19 regressions (there will be 30 x 19 x 4coefficients) and p-values of them. I was thinking about solving it like that: -creating 4 lists of names of each subsets and using lapply, but I am yet lack of skills in R to do it myself ... especially the garch part ... therefore I ask you for help. Best regards and thank in advance! T.S. [[alternative HTML version deleted]]