kevi@eg@@31 m@iii@g oii gm@ii@com
2021-Feb-23 07:21 UTC
[R] Different Lambdas and Coefficients between cv.glmnet and intercept = FALSE
Hello, I'm currently reviewing how to correctly implement `glmnet` and am having a hard time understanding why the results seem to be different between each method when `intercept = TRUE/FALSE` as I thought it should just drop the intercept from the model. However, it seems to be acting a bit different and I'm not sure how. For a given lambda, if both `X` and `y` are scaled, it appears we can identify the same results: ``` library(glmnet) data(QuickStartExample) lambda_grid <- 10 ^ seq(10, -2, length = 100) With_Intercept<-glmnet(scale(x),c(scale(y))) Without_Intercept<-glmnet(scale(x),c(scale(y)), intercept=FALSE) # Extract coefficients at a single value of lambda cbind(coef(With_Intercept,s=0.01), coef(Without_Intercept,s=0.01))[-1,] ``` While this is good, it's not clear to me how to put these back into their original scale. Further, this is for a given value of lambda. When using `cv.glmnet`, I'd like to identify the optimal lambda such that: ``` With_Intercept <- cv.glmnet(scale(x),c(scale(y)), lambda = lambda_grid) Without_Intercept <- cv.glmnet(scale(x), c(scale(y)), lambda = lambda_grid, intercept=FALSE) cbind(coef(With_Intercept, s=With_Intercept$lambda.min, exact = TRUE, x = scale(x), y = scale(y)), coef(Without_Intercept, s=Without_Intercept$lambda.min, exact = TRUE, x = scale(x), y = scale(y)))[-1,] ``` If I use `With_Intercept$lambda.min` to identify the `Without_Intercept` model, I get the same coefficients, but this doesn't necessarily give me confidence in what is the right model to use. Further, I'm not sure how to put the coefficients back into the right scale. I've tried to compare all of the possible combinations between standardising, scaling, and leaving the variables as they are, but I'm still struggling with the best method and how to ensure I'm implementing `glmnet` correctly. If anyone has advice on how to proceed and interpret these methods or get consistent results I would appreciate it. I've been reading the Introduction to Statistical Learning, Elements of Statistical Learning, Statistical Learning and Sparsity, as well as the `glmnet` vignette but am still a bit unclear. Thanks, Kevin
Bert Gunter
2021-Feb-23 22:14 UTC
[R] Different Lambdas and Coefficients between cv.glmnet and intercept = FALSE
Please note, per the posting guide linked below: "*Questions about statistics:* The R mailing lists are primarily intended for questions and discussion about the R software. However, questions about statistical methodology are sometimes posted. If the question is well-asked and of interest to someone on the list, it *may* elicit an informative up-to-date answer. See also the Usenet groups sci.stat.consult (applied statistics and consulting) and sci.stat.math (mathematical stat and probability). " -- also stats.stackexchange.com Also: "For questions about functions in standard packages distributed with R (see the FAQ Add-on packages in R <https://cran.r-project.org/doc/FAQ/R-FAQ.html#Add-on-packages-in-R>), ask questions on R-help. If the question relates to a *contributed package* , e.g., one downloaded from CRAN, try contacting the package maintainer first. You can also use find("functionname") and packageDescription("packagename") to find this information. *Only* send such questions to R-help or R-devel if you get no reply or need further assistance. This applies to both requests for help and to bug reports." -- see also ?maintainer Your query seems to be mostly statistical in nature and certainly about a non-standard package (glmnet), so if you do not get a useful response here within a few days -- you might despite the above -- try the above. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Feb 23, 2021 at 1:07 PM <kevinegan31 at gmail.com> wrote:> Hello, > > I'm currently reviewing how to correctly implement `glmnet` and am having > a hard time understanding why the results seem to be different between each > method when `intercept = TRUE/FALSE` as I thought it should just drop the > intercept from the model. However, it seems to be acting a bit different > and I'm not sure how. > > For a given lambda, if both `X` and `y` are scaled, it appears we can > identify the same results: > ``` > library(glmnet) > data(QuickStartExample) > lambda_grid <- 10 ^ seq(10, -2, length = 100) > With_Intercept<-glmnet(scale(x),c(scale(y))) > Without_Intercept<-glmnet(scale(x),c(scale(y)), intercept=FALSE) > # Extract coefficients at a single value of lambda > cbind(coef(With_Intercept,s=0.01), coef(Without_Intercept,s=0.01))[-1,] > ``` > While this is good, it's not clear to me how to put these back into their > original scale. Further, this is for a given value of lambda. When using > `cv.glmnet`, I'd like to identify the optimal lambda such that: > ``` > With_Intercept <- cv.glmnet(scale(x),c(scale(y)), lambda = lambda_grid) > Without_Intercept <- cv.glmnet(scale(x), c(scale(y)), lambda > lambda_grid, intercept=FALSE) > cbind(coef(With_Intercept, s=With_Intercept$lambda.min, exact = TRUE, x > scale(x), y = scale(y)), > coef(Without_Intercept, s=Without_Intercept$lambda.min, exact > TRUE, x = scale(x), y = scale(y)))[-1,] > ``` > If I use `With_Intercept$lambda.min` to identify the `Without_Intercept` > model, I get the same coefficients, but this doesn't necessarily give me > confidence in what is the right model to use. Further, I'm not sure how to > put the coefficients back into the right scale. > > I've tried to compare all of the possible combinations between > standardising, scaling, and leaving the variables as they are, but I'm > still struggling with the best method and how to ensure I'm implementing > `glmnet` correctly. > > If anyone has advice on how to proceed and interpret these methods or get > consistent results I would appreciate it. I've been reading the > Introduction to Statistical Learning, Elements of Statistical Learning, > Statistical Learning and Sparsity, as well as the `glmnet` vignette but am > still a bit unclear. > > Thanks, > > Kevin > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]