Hi all,
I have a data set containing variables LOSS, GDP, HPI and UE.
(I have attached it in case it is required).
Having renamed the variables as l,g,h and u, I wish to run a Lasso
Regression with l as the dependent variable and all the other 3 as the
independent variables.
data=read.table("data.txt", header=T)
l=data$LOSS
h=data$HPI
u=data$UE
g=data$GDP
matrix=data.frame(l,g,h,u)
lasso=lars(matrix,l)
But R is throwing an error (shown below) at this:
Error in rep(1, n) : invalid 'times' argument
Can you kindly suggest where I went wrong?
[Just wanted to mention that I am getting the same error when instead of
the matrix of predictor variables, I am using only a single variable, say,
g : lasso=lars(g,l)]
Appreciate any help.
Thanks,
Preetam
--
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year, Room No. N-114
Statistics Division, C.V.Raman
Hall
Indian Statistical Institute, B.H.O.S.
Kolkata.
-------------- next part --------------
Time LOSS HPI GDP UE
12/31/2007 0.015415228 636.1200000 1009642.0000 4.700000000
3/31/2008 0.019658948 639.5000000 1026274.0263 4.700000000
6/30/2008 0.018117609 628.8700000 1025106.4643 5.000000000
9/30/2008 0.016485332 613.7100000 1015979.4019 5.500000000
12/31/2008 0.016916510 607.7400000 987442.00000 6.300000000
3/31/2009 0.018278273 614.5200000 960015.76986 7.500000000
6/30/2009 0.017538295 602.5100000 950120.43148 8.300000000
9/30/2009 0.015227553 588.6100000 954884.72533 8.700000000
12/31/2009 0.015072857 559.7242622 906007.98097 8.764291953
3/31/2010 0.014273862 555.4567968 907017.35752 8.787717985
6/30/2010 0.011576847 509.7149441 812018.56940 8.666473433
9/30/2010 0.009446999 516.8190575 820322.06572 8.432603577
12/31/2010 0.009395241 512.3891743 825190.66157 8.329473700
3/31/2011 0.008447415 503.5479191 844360.87717 8.064076793
6/30/2011 0.006644175 470.4847206 641167.88195 8.295522051
9/30/2011 0.005096770 472.5451205 642125.56548 8.225821087
12/31/2011 0.006574772 476.6826107 643891.13735 7.919498716
3/31/2012 0.006658841 470.3882067 661123.50413 7.794685722
6/30/2012 0.006045554 473.5204989 700309.99021 7.949811968
9/30/2012 0.005367697 486.1438646 705789.73419 8.455191725
On May 4, 2013, at 6:09 AM, Preetam Pal wrote:> Hi all, > I have a data set containing variables LOSS, GDP, HPI and UE. > (I have attached it in case it is required). > > Having renamed the variables as l,g,h and u, I wish to run a Lasso > Regression with l as the dependent variable and all the other 3 as the > independent variables. > > data=read.table("data.txt", header=T) > l=data$LOSS > h=data$HPI > u=data$UE > g=data$GDP > > matrix=data.frame(l,g,h,u) > lasso=lars(matrix,l) > > But R is throwing an error (shown below) at this: > > Error in rep(1, n) : invalid 'times' argumentI get a different error using package:lars version 1.1 but the problem is likely that same. You created an object named `matrix` which is not a matrix. You apparently expected `lars` to recognize your intent. It didn't. (You also included your response variable in your set of predictors. `lars` will run this without error, but treats it like a tautology. ) Try offering the types of R objects that `lars` is documented to accept.> > Can you kindly suggest where I went wrong? > > [Just wanted to mention that I am getting the same error when instead of > the matrix of predictor variables, I am using only a single variable, say, > g : lasso=lars(g,l)] > > Appreciate any help. > > Thanks, > Preetam > -- > Preetam Pal > (+91)-9432212774 > M-Stat 2nd Year, Room No. N-114 > Statistics Division, C.V.Raman > Hall > Indian Statistical Institute, B.H.O.S. > Kolkata. > <data.txt>-- David Winsemius Alameda, CA, USA
Thanks David for the paper, I understand the theory. But my question is about R only: the vector of coefficients that R outputs in lars(), does it apply against the original variable y or against (y-y_bar). I have put in intercept=T as well in my lars() model. I need this information to calculate the residuals. To illustrate my point: I put lasso=lars(x,y,intercept=T) R gives me the coefficient beta. Does this mean the model is y=x*beta or is it (transformed y) = beta*(transformed x)? I guess R first transforms the variables, finds the optimum beta and then readjusts the estimates to fit the original x and y variables. I am a bit confused, because in this case, R should have returned something (a function of x_bar and y_bar) as the intercept (which it clearly does not).I am not able to find any documentation on this. Appreciate your help on this. Thanks, Preetam On Sun, May 5, 2013 at 12:55 AM, David Winsemius <dwinsemius@comcast.net>wrote:> > On May 4, 2013, at 10:13 AM, Preetam Pal wrote: > > > Hi, > > I rectified my error (thanks David for pointing it out) > > Now I have been able to run the code: > > > > data=read.table("data.txt", header=T) > > > l=data$LOSS > > > h=data$HPI > > > u=data$UE > > > g=data$GDP > > > > > > matrix=cbind(g,h,u) > > > lasso=lars(matrix,l) > > > > > > > The final set of coefficients for the regression is the last row of > coef(lasso). Am I right? > > Plus what happens to the intercept estimate? It is not available in > coef(lasso). > > Please read the cited documentation ... top of page 3: > http://www-stat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf > > " By location and scale transformations we can always assume that the > covariates have been standardized to have mean 0 and unit length, and that > the response has mean 0," > > Hence no need for an Intercept. > > -- > David. > > > > Any help is welcome. > > > > Thanks, > > Preetam > > > > > > On Sat, May 4, 2013 at 9:52 PM, David Winsemius <dwinsemius@comcast.net> > wrote: > > > > On May 4, 2013, at 6:09 AM, Preetam Pal wrote: > > > > > Hi all, > > > I have a data set containing variables LOSS, GDP, HPI and UE. > > > (I have attached it in case it is required). > > > > > > Having renamed the variables as l,g,h and u, I wish to run a Lasso > > > Regression with l as the dependent variable and all the other 3 as the > > > independent variables. > > > > > > data=read.table("data.txt", header=T) > > > l=data$LOSS > > > h=data$HPI > > > u=data$UE > > > g=data$GDP > > > > > > matrix=data.frame(l,g,h,u) > > > lasso=lars(matrix,l) > > > > > > But R is throwing an error (shown below) at this: > > > > > > Error in rep(1, n) : invalid 'times' argument > > > > I get a different error using package:lars version 1.1 but the problem > is likely that same. You created an object named `matrix` which is not a > matrix. You apparently expected `lars` to recognize your intent. It didn't. > (You also included your response variable in your set of predictors. > `lars` will run this without error, but treats it like a tautology. ) Try > offering the types of R objects that `lars` is documented to accept. > > > > > > > > Can you kindly suggest where I went wrong? > > > > > > [Just wanted to mention that I am getting the same error when instead > of > > > the matrix of predictor variables, I am using only a single variable, > say, > > > g : lasso=lars(g,l)] > > > > > > Appreciate any help. > > > > > > > > David Winsemius > Alameda, CA, USA > >-- Preetam Pal (+91)-9432212774 M-Stat 2nd Year, Room No. N-114 Statistics Division, C.V.Raman Hall Indian Statistical Institute, B.H.O.S. Kolkata. [[alternative HTML version deleted]]