Hi Simon, Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). So that's consistent, at least. Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the value from SPSS to 5dp, which is an interesting coincidence if these numbers have no particular external meaning in lm.ridge(). Kind regards, Nick ----- Original Message ----- From: "Simon Bonner" <sbonner6 at uwo.ca> To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org Sent: Thursday, 4 May, 2017 7:07:33 PM Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS Hi Nick, I think that the problem here is your use of $coef to extract the coefficients of the ridge regression. The help for lm.ridge states that coef is a "matrix of coefficients, one row for each value of lambda. Note that these are not on the original scale and are for use by the coef method." I ran a small test with simulated data, code is copied below, and indeed the output from lm.ridge differs depending on whether the coefficients are accessed via $coef or via the coefficients() function. The latter does produce results that match the output from lm. I hope that helps. Cheers, Simon ## Load packages library(MASS) ## Set seed set.seed(8888) ## Set parameters n <- 100 beta <- c(1,0,1) sigma <- .5 rho <- .75 ## Simulate correlated covariates Sigma <- matrix(c(1,rho,rho,1),ncol=2) X <- mvrnorm(n,c(0,0),Sigma=Sigma) ## Simulate data mu <- beta[1] + X %*% beta[-1] y <- rnorm(n,mu,sigma) ## Fit model with lm() fit1 <- lm(y ~ X) ## Fit model with lm.ridge() fit2 <- lm.ridge(y ~ X) ## Compare coefficients cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) [,1] [,2] [,3] (Intercept) 0.99276001 NA 0.99276001 X1 -0.03980772 -0.04282391 -0.03980772 X2 1.11167179 1.06200476 1.11167179 -- Simon Bonner Assistant Professor of Environmetrics/ Director MMASc Department of Statistical and Actuarial Sciences/Department of Biology University of Western Ontario Office: Western Science Centre rm 276 Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/> -----Original Message----- > From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick > Brown > Sent: May 4, 2017 10:29 AM > To: r-devel at r-project.org > Subject: [Rd] lm() gives different results to lm.ridge() and SPSS > > Hallo, > > I hope I am posting to the right place. I was advised to try this list by Ben Bolker > (https://twitter.com/bolkerb/status/859909918446497795). I also posted this > question to StackOverflow > (http://stackoverflow.com/questions/43771269/lm-gives-different-results- > from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first > program in 1975 and have been paid to program in about 15 different > languages, so I have some general background knowledge. > > > I have a regression from which I extract the coefficients like this: > lm(y ~ x1 * x2, data=ds)$coef > That gives: x1=0.40, x2=0.37, x1*x2=0.09 > > > > When I do the same regression in SPSS, I get: > beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. > So the main effects are in agreement, but there is quite a difference in the > coefficient for the interaction. > > > X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my > idea, but it got published), so there is quite possibly something going on with > collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of where > the problems are occurring. > > > The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty) and > check we get the same results as with lm(): > lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef > x1=0.40, x2=0.37, x1*x2=0.14 > So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the > default, so it can be omitted; I can alternate between including or deleting > ".ridge" in the function call, and watch the coefficient for the interaction > change.) > > > > What seems slightly strange to me here is that I assumed that lm.ridge() just > piggybacks on lm() anyway, so in the specific case where lambda=0 and there > is no "ridging" to do, I'd expect exactly the same results. > > > Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex will > not be easy to make, but I can share the data via Dropbox or something if that > would help. > > > > I appreciate that when there is strong collinearity then all bets are off in terms > of what the betas mean, but I would really expect lm() and lm.ridge() to give > the same results. (I would be happy to ignore SPSS, but for the moment it's > part of the majority!) > > > > Thanks for reading, > Nick > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel[[alternative HTML version deleted]]
peter dalgaard
2017-May-05 08:02 UTC
[Rd] lm() gives different results to lm.ridge() and SPSS
I asked you before, but in case you missed it: Are you looking at the right place in SPSS output? The UNstandardized coefficients should be comparable to R, i.e. the "B" column, not "Beta". -pd> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote: > > Hi Simon, > > Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). So that's consistent, at least. > > Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the value from SPSS to 5dp, which is an interesting coincidence if these numbers have no particular external meaning in lm.ridge(). > > Kind regards, > Nick > > ----- Original Message ----- > > From: "Simon Bonner" <sbonner6 at uwo.ca> > To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org > Sent: Thursday, 4 May, 2017 7:07:33 PM > Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS > > Hi Nick, > > I think that the problem here is your use of $coef to extract the coefficients of the ridge regression. The help for lm.ridge states that coef is a "matrix of coefficients, one row for each value of lambda. Note that these are not on the original scale and are for use by the coef method." > > I ran a small test with simulated data, code is copied below, and indeed the output from lm.ridge differs depending on whether the coefficients are accessed via $coef or via the coefficients() function. The latter does produce results that match the output from lm. > > I hope that helps. > > Cheers, > > Simon > > ## Load packages > library(MASS) > > ## Set seed > set.seed(8888) > > ## Set parameters > n <- 100 > beta <- c(1,0,1) > sigma <- .5 > rho <- .75 > > ## Simulate correlated covariates > Sigma <- matrix(c(1,rho,rho,1),ncol=2) > X <- mvrnorm(n,c(0,0),Sigma=Sigma) > > ## Simulate data > mu <- beta[1] + X %*% beta[-1] > y <- rnorm(n,mu,sigma) > > ## Fit model with lm() > fit1 <- lm(y ~ X) > > ## Fit model with lm.ridge() > fit2 <- lm.ridge(y ~ X) > > ## Compare coefficients > cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) > > [,1] [,2] [,3] > (Intercept) 0.99276001 NA 0.99276001 > X1 -0.03980772 -0.04282391 -0.03980772 > X2 1.11167179 1.06200476 1.11167179 > > -- > > Simon Bonner > Assistant Professor of Environmetrics/ Director MMASc > Department of Statistical and Actuarial Sciences/Department of Biology > University of Western Ontario > > Office: Western Science Centre rm 276 > > Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 > Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/ > >> -----Original Message----- >> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick >> Brown >> Sent: May 4, 2017 10:29 AM >> To: r-devel at r-project.org >> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hallo, >> >> I hope I am posting to the right place. I was advised to try this list by Ben Bolker >> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this >> question to StackOverflow >> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- >> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first >> program in 1975 and have been paid to program in about 15 different >> languages, so I have some general background knowledge. >> >> >> I have a regression from which I extract the coefficients like this: >> lm(y ~ x1 * x2, data=ds)$coef >> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >> >> >> >> When I do the same regression in SPSS, I get: >> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >> So the main effects are in agreement, but there is quite a difference in the >> coefficient for the interaction. >> >> >> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my >> idea, but it got published), so there is quite possibly something going on with >> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of where >> the problems are occurring. >> >> >> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty) and >> check we get the same results as with lm(): >> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >> x1=0.40, x2=0.37, x1*x2=0.14 >> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the >> default, so it can be omitted; I can alternate between including or deleting >> ".ridge" in the function call, and watch the coefficient for the interaction >> change.) >> >> >> >> What seems slightly strange to me here is that I assumed that lm.ridge() just >> piggybacks on lm() anyway, so in the specific case where lambda=0 and there >> is no "ridging" to do, I'd expect exactly the same results. >> >> >> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex will >> not be easy to make, but I can share the data via Dropbox or something if that >> would help. >> >> >> >> I appreciate that when there is strong collinearity then all bets are off in terms >> of what the betas mean, but I would really expect lm() and lm.ridge() to give >> the same results. (I would be happy to ignore SPSS, but for the moment it's >> part of the majority!) >> >> >> >> Thanks for reading, >> Nick >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Hi, Here is (I hope) all the relevant output from R.> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA-0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the problem originally. :-)> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA0.07342198 -0.39650356 -0.36569488 -0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS is attached. The unstandardized coefficients in SPSS look nothing like those in R. The standardized coefficients in SPSS match the lm.ridge()$coef numbers very closely indeed, suggesting that the same algorithm may be in use. I have put the dataset file, which is the untouched original I received from the authors, in this Dropbox folder: https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. You can read it into R with this code (one variable needs to be standardized and centered; everything else is already in the file): s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) s1$ZDEPRESSION <- scale(s1$DEPRESSION) Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM will want to fix it quickly (ha ha ha). Nick ----- Original Message ----- From: "peter dalgaard" <pdalgd at gmail.com> To: "Nick Brown" <nick.brown at free.fr> Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at r-project.org Sent: Friday, 5 May, 2017 10:02:10 AM Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS I asked you before, but in case you missed it: Are you looking at the right place in SPSS output? The UNstandardized coefficients should be comparable to R, i.e. the "B" column, not "Beta". -pd> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote: > > Hi Simon, > > Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). So that's consistent, at least. > > Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the value from SPSS to 5dp, which is an interesting coincidence if these numbers have no particular external meaning in lm.ridge(). > > Kind regards, > Nick > > ----- Original Message ----- > > From: "Simon Bonner" <sbonner6 at uwo.ca> > To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org > Sent: Thursday, 4 May, 2017 7:07:33 PM > Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS > > Hi Nick, > > I think that the problem here is your use of $coef to extract the coefficients of the ridge regression. The help for lm.ridge states that coef is a "matrix of coefficients, one row for each value of lambda. Note that these are not on the original scale and are for use by the coef method." > > I ran a small test with simulated data, code is copied below, and indeed the output from lm.ridge differs depending on whether the coefficients are accessed via $coef or via the coefficients() function. The latter does produce results that match the output from lm. > > I hope that helps. > > Cheers, > > Simon > > ## Load packages > library(MASS) > > ## Set seed > set.seed(8888) > > ## Set parameters > n <- 100 > beta <- c(1,0,1) > sigma <- .5 > rho <- .75 > > ## Simulate correlated covariates > Sigma <- matrix(c(1,rho,rho,1),ncol=2) > X <- mvrnorm(n,c(0,0),Sigma=Sigma) > > ## Simulate data > mu <- beta[1] + X %*% beta[-1] > y <- rnorm(n,mu,sigma) > > ## Fit model with lm() > fit1 <- lm(y ~ X) > > ## Fit model with lm.ridge() > fit2 <- lm.ridge(y ~ X) > > ## Compare coefficients > cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) > > [,1] [,2] [,3] > (Intercept) 0.99276001 NA 0.99276001 > X1 -0.03980772 -0.04282391 -0.03980772 > X2 1.11167179 1.06200476 1.11167179 > > -- > > Simon Bonner > Assistant Professor of Environmetrics/ Director MMASc > Department of Statistical and Actuarial Sciences/Department of Biology > University of Western Ontario > > Office: Western Science Centre rm 276 > > Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 > Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/ > >> -----Original Message----- >> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick >> Brown >> Sent: May 4, 2017 10:29 AM >> To: r-devel at r-project.org >> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hallo, >> >> I hope I am posting to the right place. I was advised to try this list by Ben Bolker >> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this >> question to StackOverflow >> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- >> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first >> program in 1975 and have been paid to program in about 15 different >> languages, so I have some general background knowledge. >> >> >> I have a regression from which I extract the coefficients like this: >> lm(y ~ x1 * x2, data=ds)$coef >> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >> >> >> >> When I do the same regression in SPSS, I get: >> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >> So the main effects are in agreement, but there is quite a difference in the >> coefficient for the interaction. >> >> >> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my >> idea, but it got published), so there is quite possibly something going on with >> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of where >> the problems are occurring. >> >> >> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty) and >> check we get the same results as with lm(): >> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >> x1=0.40, x2=0.37, x1*x2=0.14 >> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the >> default, so it can be omitted; I can alternate between including or deleting >> ".ridge" in the function call, and watch the coefficient for the interaction >> change.) >> >> >> >> What seems slightly strange to me here is that I assumed that lm.ridge() just >> piggybacks on lm() anyway, so in the specific case where lambda=0 and there >> is no "ridging" to do, I'd expect exactly the same results. >> >> >> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex will >> not be easy to make, but I can share the data via Dropbox or something if that >> would help. >> >> >> >> I appreciate that when there is strong collinearity then all bets are off in terms >> of what the betas mean, but I would really expect lm() and lm.ridge() to give >> the same results. (I would be happy to ignore SPSS, but for the moment it's >> part of the majority!) >> >> >> >> Thanks for reading, >> Nick >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: SPSS.png Type: image/png Size: 38704 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20170505/24adf071/attachment.png>
Possibly Parallel Threads
- lm() gives different results to lm.ridge() and SPSS
- lm() gives different results to lm.ridge() and SPSS
- lm() gives different results to lm.ridge() and SPSS
- lm() gives different results to lm.ridge() and SPSS
- lm() gives different results to lm.ridge() and SPSS