Thanks David for the paper, I understand the theory.
But my question is about R only: the vector of coefficients that R outputs
in lars(), does it apply against the original variable y or against
(y-y_bar). I have put in intercept=T as well in my lars() model.
I need this information to calculate the residuals.
To illustrate my point:
I put lasso=lars(x,y,intercept=T)
R gives me the coefficient beta.
Does this mean the model is y=x*beta
or is it (transformed y) = beta*(transformed x)?
I guess R first transforms the variables, finds the optimum beta and then
readjusts the estimates to fit the original x and y variables. I am a bit
confused, because in this case, R should have returned something (a
function of x_bar and y_bar) as the intercept (which it clearly does
not).I am not able to find any documentation on this.
Appreciate your help on this.
Thanks,
Preetam
On Sun, May 5, 2013 at 12:55 AM, David Winsemius
<dwinsemius@comcast.net>wrote:
>
> On May 4, 2013, at 10:13 AM, Preetam Pal wrote:
>
> > Hi,
> > I rectified my error (thanks David for pointing it out)
> > Now I have been able to run the code:
> >
> > data=read.table("data.txt", header=T)
> > > l=data$LOSS
> > > h=data$HPI
> > > u=data$UE
> > > g=data$GDP
> > >
> > > matrix=cbind(g,h,u)
> > > lasso=lars(matrix,l)
> > >
> >
> > The final set of coefficients for the regression is the last row of
> coef(lasso). Am I right?
> > Plus what happens to the intercept estimate? It is not available in
> coef(lasso).
>
> Please read the cited documentation ... top of page 3:
> http://www-stat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf
>
> " By location and scale transformations we can always assume that the
> covariates have been standardized to have mean 0 and unit length, and that
> the response has mean 0,"
>
> Hence no need for an Intercept.
>
> --
> David.
> >
> > Any help is welcome.
> >
> > Thanks,
> > Preetam
> >
> >
> > On Sat, May 4, 2013 at 9:52 PM, David Winsemius
<dwinsemius@comcast.net>
> wrote:
> >
> > On May 4, 2013, at 6:09 AM, Preetam Pal wrote:
> >
> > > Hi all,
> > > I have a data set containing variables LOSS, GDP, HPI and UE.
> > > (I have attached it in case it is required).
> > >
> > > Having renamed the variables as l,g,h and u, I wish to run a
Lasso
> > > Regression with l as the dependent variable and all the other 3
as the
> > > independent variables.
> > >
> > > data=read.table("data.txt", header=T)
> > > l=data$LOSS
> > > h=data$HPI
> > > u=data$UE
> > > g=data$GDP
> > >
> > > matrix=data.frame(l,g,h,u)
> > > lasso=lars(matrix,l)
> > >
> > > But R is throwing an error (shown below) at this:
> > >
> > > Error in rep(1, n) : invalid 'times' argument
> >
> > I get a different error using package:lars version 1.1 but the problem
> is likely that same. You created an object named `matrix` which is not a
> matrix. You apparently expected `lars` to recognize your intent. It
didn't.
> (You also included your response variable in your set of predictors.
> `lars` will run this without error, but treats it like a tautology. ) Try
> offering the types of R objects that `lars` is documented to accept.
> >
> > >
> > > Can you kindly suggest where I went wrong?
> > >
> > > [Just wanted to mention that I am getting the same error when
instead
> of
> > > the matrix of predictor variables, I am using only a single
variable,
> say,
> > > g : lasso=lars(g,l)]
> > >
> > > Appreciate any help.
> > >
> >
>
>
> David Winsemius
> Alameda, CA, USA
>
>
--
Preetam Pal
(+91)-9432212774
M-Stat 2nd Year, Room No. N-114
Statistics Division, C.V.Raman
Hall
Indian Statistical Institute, B.H.O.S.
Kolkata.
[[alternative HTML version deleted]]