Hi, Here is (I hope) all the relevant output from R.> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA-0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the problem originally. :-)> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA0.07342198 -0.39650356 -0.36569488 -0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS is attached. The unstandardized coefficients in SPSS look nothing like those in R. The standardized coefficients in SPSS match the lm.ridge()$coef numbers very closely indeed, suggesting that the same algorithm may be in use. I have put the dataset file, which is the untouched original I received from the authors, in this Dropbox folder: https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. You can read it into R with this code (one variable needs to be standardized and centered; everything else is already in the file): s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) s1$ZDEPRESSION <- scale(s1$DEPRESSION) Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM will want to fix it quickly (ha ha ha). Nick ----- Original Message ----- From: "peter dalgaard" <pdalgd at gmail.com> To: "Nick Brown" <nick.brown at free.fr> Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at r-project.org Sent: Friday, 5 May, 2017 10:02:10 AM Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS I asked you before, but in case you missed it: Are you looking at the right place in SPSS output? The UNstandardized coefficients should be comparable to R, i.e. the "B" column, not "Beta". -pd> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote: > > Hi Simon, > > Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). So that's consistent, at least. > > Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the value from SPSS to 5dp, which is an interesting coincidence if these numbers have no particular external meaning in lm.ridge(). > > Kind regards, > Nick > > ----- Original Message ----- > > From: "Simon Bonner" <sbonner6 at uwo.ca> > To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org > Sent: Thursday, 4 May, 2017 7:07:33 PM > Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS > > Hi Nick, > > I think that the problem here is your use of $coef to extract the coefficients of the ridge regression. The help for lm.ridge states that coef is a "matrix of coefficients, one row for each value of lambda. Note that these are not on the original scale and are for use by the coef method." > > I ran a small test with simulated data, code is copied below, and indeed the output from lm.ridge differs depending on whether the coefficients are accessed via $coef or via the coefficients() function. The latter does produce results that match the output from lm. > > I hope that helps. > > Cheers, > > Simon > > ## Load packages > library(MASS) > > ## Set seed > set.seed(8888) > > ## Set parameters > n <- 100 > beta <- c(1,0,1) > sigma <- .5 > rho <- .75 > > ## Simulate correlated covariates > Sigma <- matrix(c(1,rho,rho,1),ncol=2) > X <- mvrnorm(n,c(0,0),Sigma=Sigma) > > ## Simulate data > mu <- beta[1] + X %*% beta[-1] > y <- rnorm(n,mu,sigma) > > ## Fit model with lm() > fit1 <- lm(y ~ X) > > ## Fit model with lm.ridge() > fit2 <- lm.ridge(y ~ X) > > ## Compare coefficients > cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) > > [,1] [,2] [,3] > (Intercept) 0.99276001 NA 0.99276001 > X1 -0.03980772 -0.04282391 -0.03980772 > X2 1.11167179 1.06200476 1.11167179 > > -- > > Simon Bonner > Assistant Professor of Environmetrics/ Director MMASc > Department of Statistical and Actuarial Sciences/Department of Biology > University of Western Ontario > > Office: Western Science Centre rm 276 > > Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 > Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/ > >> -----Original Message----- >> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick >> Brown >> Sent: May 4, 2017 10:29 AM >> To: r-devel at r-project.org >> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hallo, >> >> I hope I am posting to the right place. I was advised to try this list by Ben Bolker >> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this >> question to StackOverflow >> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- >> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first >> program in 1975 and have been paid to program in about 15 different >> languages, so I have some general background knowledge. >> >> >> I have a regression from which I extract the coefficients like this: >> lm(y ~ x1 * x2, data=ds)$coef >> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >> >> >> >> When I do the same regression in SPSS, I get: >> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >> So the main effects are in agreement, but there is quite a difference in the >> coefficient for the interaction. >> >> >> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my >> idea, but it got published), so there is quite possibly something going on with >> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of where >> the problems are occurring. >> >> >> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty) and >> check we get the same results as with lm(): >> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >> x1=0.40, x2=0.37, x1*x2=0.14 >> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the >> default, so it can be omitted; I can alternate between including or deleting >> ".ridge" in the function call, and watch the coefficient for the interaction >> change.) >> >> >> >> What seems slightly strange to me here is that I assumed that lm.ridge() just >> piggybacks on lm() anyway, so in the specific case where lambda=0 and there >> is no "ridging" to do, I'd expect exactly the same results. >> >> >> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex will >> not be easy to make, but I can share the data via Dropbox or something if that >> would help. >> >> >> >> I appreciate that when there is strong collinearity then all bets are off in terms >> of what the betas mean, but I would really expect lm() and lm.ridge() to give >> the same results. (I would be happy to ignore SPSS, but for the moment it's >> part of the majority!) >> >> >> >> Thanks for reading, >> Nick >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com -------------- next part -------------- A non-text attachment was scrubbed... Name: SPSS.png Type: image/png Size: 38704 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20170505/24adf071/attachment.png>
Viechtbauer Wolfgang (SP)
2017-May-05 12:43 UTC
[Rd] lm() gives different results to lm.ridge() and SPSS
I had no problems running regression models in SPSS and R that yielded the same results for these data. The difference you are observing is from fitting different models. In R, you fitted: res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) summary(res) The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This is not a standardized variable itself and not the same as "ZINTER_PA_C" in the png you showed, which is not a variable in the dataset, but can be created with: dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) If you want the same results as in SPSS, then you need to fit: res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, data=dat) summary(res) This yields: Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 6.41041 0.01722 372.21 <2e-16 *** ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Exactly the same as in the png. Peter already mentioned this as a possible reason for the discrepancy: https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it perhaps the case that x1 and x2 have already been scaled to have standard deviation 1? In that case, x1*x2 won't be.") Best, Wolfgang -----Original Message----- From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick Brown Sent: Friday, May 05, 2017 10:40 To: peter dalgaard Cc: r-devel at r-project.org Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS Hi, Here is (I hope) all the relevant output from R.> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA-0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the problem originally. :-)> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA0.07342198 -0.39650356 -0.36569488 -0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS is attached. The unstandardized coefficients in SPSS look nothing like those in R. The standardized coefficients in SPSS match the lm.ridge()$coef numbers very closely indeed, suggesting that the same algorithm may be in use. I have put the dataset file, which is the untouched original I received from the authors, in this Dropbox folder: https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. You can read it into R with this code (one variable needs to be standardized and centered; everything else is already in the file): s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) s1$ZDEPRESSION <- scale(s1$DEPRESSION) Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM will want to fix it quickly (ha ha ha). Nick ----- Original Message ----- From: "peter dalgaard" <pdalgd at gmail.com> To: "Nick Brown" <nick.brown at free.fr> Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at r-project.org Sent: Friday, 5 May, 2017 10:02:10 AM Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS I asked you before, but in case you missed it: Are you looking at the right place in SPSS output? The UNstandardized coefficients should be comparable to R, i.e. the "B" column, not "Beta". -pd> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote: > > Hi Simon, > > Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). So that's consistent, at least. > > Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the value from SPSS to 5dp, which is an interesting coincidence if these numbers have no particular external meaning in lm.ridge(). > > Kind regards, > Nick > > ----- Original Message ----- > > From: "Simon Bonner" <sbonner6 at uwo.ca> > To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org > Sent: Thursday, 4 May, 2017 7:07:33 PM > Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS > > Hi Nick, > > I think that the problem here is your use of $coef to extract the coefficients of the ridge regression. The help for lm.ridge states that coef is a "matrix of coefficients, one row for each value of lambda. Note that these are not on the original scale and are for use by the coef method." > > I ran a small test with simulated data, code is copied below, and indeed the output from lm.ridge differs depending on whether the coefficients are accessed via $coef or via the coefficients() function. The latter does produce results that match the output from lm. > > I hope that helps. > > Cheers, > > Simon > > ## Load packages > library(MASS) > > ## Set seed > set.seed(8888) > > ## Set parameters > n <- 100 > beta <- c(1,0,1) > sigma <- .5 > rho <- .75 > > ## Simulate correlated covariates > Sigma <- matrix(c(1,rho,rho,1),ncol=2) > X <- mvrnorm(n,c(0,0),Sigma=Sigma) > > ## Simulate data > mu <- beta[1] + X %*% beta[-1] > y <- rnorm(n,mu,sigma) > > ## Fit model with lm() > fit1 <- lm(y ~ X) > > ## Fit model with lm.ridge() > fit2 <- lm.ridge(y ~ X) > > ## Compare coefficients > cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) > > [,1] [,2] [,3] > (Intercept) 0.99276001 NA 0.99276001 > X1 -0.03980772 -0.04282391 -0.03980772 > X2 1.11167179 1.06200476 1.11167179 > > -- > > Simon Bonner > Assistant Professor of Environmetrics/ Director MMASc > Department of Statistical and Actuarial Sciences/Department of Biology > University of Western Ontario > > Office: Western Science Centre rm 276 > > Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 > Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/ > >> -----Original Message----- >> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick >> Brown >> Sent: May 4, 2017 10:29 AM >> To: r-devel at r-project.org >> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hallo, >> >> I hope I am posting to the right place. I was advised to try this list by Ben Bolker >> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this >> question to StackOverflow >> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- >> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first >> program in 1975 and have been paid to program in about 15 different >> languages, so I have some general background knowledge. >> >> I have a regression from which I extract the coefficients like this: >> lm(y ~ x1 * x2, data=ds)$coef >> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >> >> When I do the same regression in SPSS, I get: >> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >> So the main effects are in agreement, but there is quite a difference in the >> coefficient for the interaction. >> >> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my >> idea, but it got published), so there is quite possibly something going on with >> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of where >> the problems are occurring. >> >> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty) and >> check we get the same results as with lm(): >> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >> x1=0.40, x2=0.37, x1*x2=0.14 >> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the >> default, so it can be omitted; I can alternate between including or deleting >> ".ridge" in the function call, and watch the coefficient for the interaction >> change.) >> >> What seems slightly strange to me here is that I assumed that lm.ridge() just >> piggybacks on lm() anyway, so in the specific case where lambda=0 and there >> is no "ridging" to do, I'd expect exactly the same results. >> >> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex will >> not be easy to make, but I can share the data via Dropbox or something if that >> would help. >> >> I appreciate that when there is strong collinearity then all bets are off in terms >> of what the betas mean, but I would really expect lm() and lm.ridge() to give >> the same results. (I would be happy to ignore SPSS, but for the moment it's >> part of the majority!) >> >> Thanks for reading, >> Nick
peter dalgaard
2017-May-05 13:33 UTC
[Rd] lm() gives different results to lm.ridge() and SPSS
Thanks, I was getting to try this, but got side tracked by actual work... Your analysis reproduces the SPSS unscaled estimates. It still remains to figure out how Nick got>coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA 0.07342198 -0.39650356 -0.36569488 -0.09435788 which does not match your output. I suspect that ZMEAN_PA and ZDIVERSITY_PA were scaled for this analysis (but the interaction term still obviously is not). I conjecture that something in the vicinity of res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) summary(res) would reproduce the SPSS Beta values.> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) <wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: > > I had no problems running regression models in SPSS and R that yielded the same results for these data. > > The difference you are observing is from fitting different models. In R, you fitted: > > res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) > summary(res) > > The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This is not a standardized variable itself and not the same as "ZINTER_PA_C" in the png you showed, which is not a variable in the dataset, but can be created with: > > dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) > > If you want the same results as in SPSS, then you need to fit: > > res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, data=dat) > summary(res) > > This yields: > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 6.41041 0.01722 372.21 <2e-16 *** > ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** > ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** > ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Exactly the same as in the png. > > Peter already mentioned this as a possible reason for the discrepancy: https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it perhaps the case that x1 and x2 have already been scaled to have standard deviation 1? In that case, x1*x2 won't be.") > > Best, > Wolfgang > > -----Original Message----- > From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick Brown > Sent: Friday, May 05, 2017 10:40 > To: peter dalgaard > Cc: r-devel at r-project.org > Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > > Hi, > > Here is (I hope) all the relevant output from R. > >> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA > -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the problem originally. :-) > > >> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) (Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA > 0.07342198 -0.39650356 -0.36569488 -0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA > 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS is attached. The unstandardized coefficients in SPSS look nothing like those in R. The standardized coefficients in SPSS match the lm.ridge()$coef numbers very closely indeed, suggesting that the same algorithm may be in use. > > I have put the dataset file, which is the untouched original I received from the authors, in this Dropbox folder: https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. You can read it into R with this code (one variable needs to be standardized and centered; everything else is already in the file): > > s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) s1$ZDEPRESSION <- scale(s1$DEPRESSION) > Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure IBM will want to fix it quickly (ha ha ha). > > Nick > > ----- Original Message ----- > > From: "peter dalgaard" <pdalgd at gmail.com> > To: "Nick Brown" <nick.brown at free.fr> > Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at r-project.org > Sent: Friday, 5 May, 2017 10:02:10 AM > Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > > I asked you before, but in case you missed it: Are you looking at the right place in SPSS output? > > The UNstandardized coefficients should be comparable to R, i.e. the "B" column, not "Beta". > > -pd > >> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote: >> >> Hi Simon, >> >> Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). So that's consistent, at least. >> >> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees with the value from SPSS to 5dp, which is an interesting coincidence if these numbers have no particular external meaning in lm.ridge(). >> >> Kind regards, >> Nick >> >> ----- Original Message ----- >> >> From: "Simon Bonner" <sbonner6 at uwo.ca> >> To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org >> Sent: Thursday, 4 May, 2017 7:07:33 PM >> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hi Nick, >> >> I think that the problem here is your use of $coef to extract the coefficients of the ridge regression. The help for lm.ridge states that coef is a "matrix of coefficients, one row for each value of lambda. Note that these are not on the original scale and are for use by the coef method." >> >> I ran a small test with simulated data, code is copied below, and indeed the output from lm.ridge differs depending on whether the coefficients are accessed via $coef or via the coefficients() function. The latter does produce results that match the output from lm. >> >> I hope that helps. >> >> Cheers, >> >> Simon >> >> ## Load packages >> library(MASS) >> >> ## Set seed >> set.seed(8888) >> >> ## Set parameters >> n <- 100 >> beta <- c(1,0,1) >> sigma <- .5 >> rho <- .75 >> >> ## Simulate correlated covariates >> Sigma <- matrix(c(1,rho,rho,1),ncol=2) >> X <- mvrnorm(n,c(0,0),Sigma=Sigma) >> >> ## Simulate data >> mu <- beta[1] + X %*% beta[-1] >> y <- rnorm(n,mu,sigma) >> >> ## Fit model with lm() >> fit1 <- lm(y ~ X) >> >> ## Fit model with lm.ridge() >> fit2 <- lm.ridge(y ~ X) >> >> ## Compare coefficients >> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) >> >> [,1] [,2] [,3] >> (Intercept) 0.99276001 NA 0.99276001 >> X1 -0.03980772 -0.04282391 -0.03980772 >> X2 1.11167179 1.06200476 1.11167179 >> >> -- >> >> Simon Bonner >> Assistant Professor of Environmetrics/ Director MMASc >> Department of Statistical and Actuarial Sciences/Department of Biology >> University of Western Ontario >> >> Office: Western Science Centre rm 276 >> >> Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 >> Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/ >> >>> -----Original Message----- >>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick >>> Brown >>> Sent: May 4, 2017 10:29 AM >>> To: r-devel at r-project.org >>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >>> >>> Hallo, >>> >>> I hope I am posting to the right place. I was advised to try this list by Ben Bolker >>> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this >>> question to StackOverflow >>> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- >>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first >>> program in 1975 and have been paid to program in about 15 different >>> languages, so I have some general background knowledge. >>> >>> I have a regression from which I extract the coefficients like this: >>> lm(y ~ x1 * x2, data=ds)$coef >>> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >>> >>> When I do the same regression in SPSS, I get: >>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >>> So the main effects are in agreement, but there is quite a difference in the >>> coefficient for the interaction. >>> >>> X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my >>> idea, but it got published), so there is quite possibly something going on with >>> collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of where >>> the problems are occurring. >>> >>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty) and >>> check we get the same results as with lm(): >>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >>> x1=0.40, x2=0.37, x1*x2=0.14 >>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the >>> default, so it can be omitted; I can alternate between including or deleting >>> ".ridge" in the function call, and watch the coefficient for the interaction >>> change.) >>> >>> What seems slightly strange to me here is that I assumed that lm.ridge() just >>> piggybacks on lm() anyway, so in the specific case where lambda=0 and there >>> is no "ridging" to do, I'd expect exactly the same results. >>> >>> Unfortunately there are 34,000 cases in the dataset, so a "minimal" reprex will >>> not be easy to make, but I can share the data via Dropbox or something if that >>> would help. >>> >>> I appreciate that when there is strong collinearity then all bets are off in terms >>> of what the betas mean, but I would really expect lm() and lm.ridge() to give >>> the same results. (I would be happy to ignore SPSS, but for the moment it's >>> part of the majority!) >>> >>> Thanks for reading, >>> Nick > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
Dear Nick, On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown" <r-devel-bounces at r-project.org on behalf of nick.brown at free.fr> wrote:>>I conjecture that something in the vicinity of >> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >>summary(res) >> would reproduce the SPSS Beta values. > >Yes, that works. Thanks!That you have to work hard in R to match the SPSS results isn?t such a bad thing when you factor in the observation that standardizing the interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense. Best, John ------------------------------------- John Fox, Professor McMaster University Hamilton, Ontario, Canada Web: http://socserv.mcmaster.ca/jfox/> > >----- Original Message ----- > >From: "peter dalgaard" <pdalgd at gmail.com> >To: "Viechtbauer Wolfgang (SP)" ><wolfgang.viechtbauer at maastrichtuniversity.nl>, "Nick Brown" ><nick.brown at free.fr> >Cc: r-devel at r-project.org >Sent: Friday, 5 May, 2017 3:33:29 PM >Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > >Thanks, I was getting to try this, but got side tracked by actual work... > >Your analysis reproduces the SPSS unscaled estimates. It still remains to >figure out how Nick got > >> >coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) > >(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >0.07342198 -0.39650356 -0.36569488 -0.09435788 > > >which does not match your output. I suspect that ZMEAN_PA and >ZDIVERSITY_PA were scaled for this analysis (but the interaction term >still obviously is not). I conjecture that something in the vicinity of > >res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >summary(res) > >would reproduce the SPSS Beta values. > > >> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) >><wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: >> >> I had no problems running regression models in SPSS and R that yielded >>the same results for these data. >> >> The difference you are observing is from fitting different models. In >>R, you fitted: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) >> summary(res) >> >> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This >>is not a standardized variable itself and not the same as "ZINTER_PA_C" >>in the png you showed, which is not a variable in the dataset, but can >>be created with: >> >> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) >> >> If you want the same results as in SPSS, then you need to fit: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, >>data=dat) >> summary(res) >> >> This yields: >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 6.41041 0.01722 372.21 <2e-16 *** >> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** >> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** >> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** >> --- >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> Exactly the same as in the png. >> >> Peter already mentioned this as a possible reason for the discrepancy: >>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it >>perhaps the case that x1 and x2 have already been scaled to have >>standard deviation 1? In that case, x1*x2 won't be.") >> >> Best, >> Wolfgang >> >> -----Original Message----- >> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick >>Brown >> Sent: Friday, May 05, 2017 10:40 >> To: peter dalgaard >> Cc: r-devel at r-project.org >> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hi, >> >> Here is (I hope) all the relevant output from R. >> >>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > >>>mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, >>>na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA * >>>ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA >>>ZMEAN_PA:ZDIVERSITY_PA >> -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the >>problem originally. :-) >> >> >>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) >>>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >> 0.07342198 -0.39650356 -0.36569488 -0.09435788 > >>coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) >>ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >> 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS >>is attached. The unstandardized coefficients in SPSS look nothing like >>those in R. The standardized coefficients in SPSS match the >>lm.ridge()$coef numbers very closely indeed, suggesting that the same >>algorithm may be in use. >> >> I have put the dataset file, which is the untouched original I received >>from the authors, in this Dropbox folder: >>https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0 >>. You can read it into R with this code (one variable needs to be >>standardized and centered; everything else is already in the file): >> >> s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) >>s1$ZDEPRESSION <- scale(s1$DEPRESSION) >> Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm >>sure IBM will want to fix it quickly (ha ha ha). >> >> Nick >> >> ----- Original Message ----- >> >> From: "peter dalgaard" <pdalgd at gmail.com> >> To: "Nick Brown" <nick.brown at free.fr> >> Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at r-project.org >> Sent: Friday, 5 May, 2017 10:02:10 AM >> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> I asked you before, but in case you missed it: Are you looking at the >>right place in SPSS output? >> >> The UNstandardized coefficients should be comparable to R, i.e. the "B" >>column, not "Beta". >> >> -pd >> >>> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote: >>> >>> Hi Simon, >>> >>> Yes, if I uses coefficients() I get the same results for lm() and >>>lm.ridge(). So that's consistent, at least. >>> >>> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees >>>with the value from SPSS to 5dp, which is an interesting coincidence if >>>these numbers have no particular external meaning in lm.ridge(). >>> >>> Kind regards, >>> Nick >>> >>> ----- Original Message ----- >>> >>> From: "Simon Bonner" <sbonner6 at uwo.ca> >>> To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org >>> Sent: Thursday, 4 May, 2017 7:07:33 PM >>> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS >>> >>> Hi Nick, >>> >>> I think that the problem here is your use of $coef to extract the >>>coefficients of the ridge regression. The help for lm.ridge states that >>>coef is a "matrix of coefficients, one row for each value of lambda. >>>Note that these are not on the original scale and are for use by the >>>coef method." >>> >>> I ran a small test with simulated data, code is copied below, and >>>indeed the output from lm.ridge differs depending on whether the >>>coefficients are accessed via $coef or via the coefficients() function. >>>The latter does produce results that match the output from lm. >>> >>> I hope that helps. >>> >>> Cheers, >>> >>> Simon >>> >>> ## Load packages >>> library(MASS) >>> >>> ## Set seed >>> set.seed(8888) >>> >>> ## Set parameters >>> n <- 100 >>> beta <- c(1,0,1) >>> sigma <- .5 >>> rho <- .75 >>> >>> ## Simulate correlated covariates >>> Sigma <- matrix(c(1,rho,rho,1),ncol=2) >>> X <- mvrnorm(n,c(0,0),Sigma=Sigma) >>> >>> ## Simulate data >>> mu <- beta[1] + X %*% beta[-1] >>> y <- rnorm(n,mu,sigma) >>> >>> ## Fit model with lm() >>> fit1 <- lm(y ~ X) >>> >>> ## Fit model with lm.ridge() >>> fit2 <- lm.ridge(y ~ X) >>> >>> ## Compare coefficients >>> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) >>> >>> [,1] [,2] [,3] >>> (Intercept) 0.99276001 NA 0.99276001 >>> X1 -0.03980772 -0.04282391 -0.03980772 >>> X2 1.11167179 1.06200476 1.11167179 >>> >>> -- >>> >>> Simon Bonner >>> Assistant Professor of Environmetrics/ Director MMASc >>> Department of Statistical and Actuarial Sciences/Department of Biology >>> University of Western Ontario >>> >>> Office: Western Science Centre rm 276 >>> >>> Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: >>>519-661-3813 >>> Twitter: @bonnerstatslab | Website: >>>http://simon.bonners.ca/bonner-lab/wpblog/ >>> >>>> -----Original Message----- >>>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of >>>>Nick >>>> Brown >>>> Sent: May 4, 2017 10:29 AM >>>> To: r-devel at r-project.org >>>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >>>> >>>> Hallo, >>>> >>>> I hope I am posting to the right place. I was advised to try this >>>>list by Ben Bolker >>>> (https://twitter.com/bolkerb/status/859909918446497795). I also >>>>posted this >>>> question to StackOverflow >>>> >>>>(http://stackoverflow.com/questions/43771269/lm-gives-different-results >>>>- >>>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my >>>>first >>>> program in 1975 and have been paid to program in about 15 different >>>> languages, so I have some general background knowledge. >>>> >>>> I have a regression from which I extract the coefficients like this: >>>> lm(y ~ x1 * x2, data=ds)$coef >>>> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >>>> >>>> When I do the same regression in SPSS, I get: >>>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >>>> So the main effects are in agreement, but there is quite a difference >>>>in the >>>> coefficient for the interaction. >>>> >>>> X1 and X2 are correlated about .75 (yes, yes, I know - this model >>>>wasn't my >>>> idea, but it got published), so there is quite possibly something >>>>going on with >>>> collinearity. So I thought I'd try lm.ridge() to see if I can get an >>>>idea of where >>>> the problems are occurring. >>>> >>>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge >>>>penalty) and >>>> check we get the same results as with lm(): >>>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >>>> x1=0.40, x2=0.37, x1*x2=0.14 >>>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, >>>>lambda=0 is the >>>> default, so it can be omitted; I can alternate between including or >>>>deleting >>>> ".ridge" in the function call, and watch the coefficient for the >>>>interaction >>>> change.) >>>> >>>> What seems slightly strange to me here is that I assumed that >>>>lm.ridge() just >>>> piggybacks on lm() anyway, so in the specific case where lambda=0 and >>>>there >>>> is no "ridging" to do, I'd expect exactly the same results. >>>> >>>> Unfortunately there are 34,000 cases in the dataset, so a "minimal" >>>>reprex will >>>> not be easy to make, but I can share the data via Dropbox or >>>>something if that >>>> would help. >>>> >>>> I appreciate that when there is strong collinearity then all bets are >>>>off in terms >>>> of what the betas mean, but I would really expect lm() and lm.ridge() >>>>to give >>>> the same results. (I would be happy to ignore SPSS, but for the >>>>moment it's >>>> part of the majority!) >>>> >>>> Thanks for reading, >>>> Nick >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > >-- >Peter Dalgaard, Professor, >Center for Statistics, Copenhagen Business School >Solbjerg Plads 3, 2000 Frederiksberg, Denmark >Phone: (+45)38153501 >Office: A 4.23 >Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-devel at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel
Viechtbauer Wolfgang (SP)
2017-May-05 20:34 UTC
[Rd] lm() gives different results to lm.ridge() and SPSS
Totally agree that standardizing the interaction term is nonsense. But in all fairness, SPSS doesn't do that. In fact, the 'REGRESSION' command in SPSS doesn't compute any interaction terms -- one has to first compute them 'by hand' and then add them to the model like any other variable. So somebody worked extra hard to standardize that interaction term in SPSS as well :/ Best, Wolfgang -----Original Message----- From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Fox, John Sent: Friday, May 05, 2017 20:23 To: Nick Brown; peter dalgaard Cc: r-devel at r-project.org Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS Dear Nick, On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown" <r-devel-bounces at r-project.org on behalf of nick.brown at free.fr> wrote:>>I conjecture that something in the vicinity of >> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >>summary(res) >> would reproduce the SPSS Beta values. > >Yes, that works. Thanks!That you have to work hard in R to match the SPSS results isn?t such a bad thing when you factor in the observation that standardizing the interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense. Best, John ------------------------------------- John Fox, Professor McMaster University Hamilton, Ontario, Canada Web: http://socserv.mcmaster.ca/jfox/>----- Original Message ----- > >From: "peter dalgaard" <pdalgd at gmail.com> >To: "Viechtbauer Wolfgang (SP)" ><wolfgang.viechtbauer at maastrichtuniversity.nl>, "Nick Brown" ><nick.brown at free.fr> >Cc: r-devel at r-project.org >Sent: Friday, 5 May, 2017 3:33:29 PM >Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > >Thanks, I was getting to try this, but got side tracked by actual work... > >Your analysis reproduces the SPSS unscaled estimates. It still remains to >figure out how Nick got > >> >coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) > >(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >0.07342198 -0.39650356 -0.36569488 -0.09435788 > > >which does not match your output. I suspect that ZMEAN_PA and >ZDIVERSITY_PA were scaled for this analysis (but the interaction term >still obviously is not). I conjecture that something in the vicinity of > >res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >summary(res) > >would reproduce the SPSS Beta values. > >> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) >><wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: >> >> I had no problems running regression models in SPSS and R that yielded >>the same results for these data. >> >> The difference you are observing is from fitting different models. In >>R, you fitted: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) >> summary(res) >> >> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This >>is not a standardized variable itself and not the same as "ZINTER_PA_C" >>in the png you showed, which is not a variable in the dataset, but can >>be created with: >> >> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) >> >> If you want the same results as in SPSS, then you need to fit: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, >>data=dat) >> summary(res) >> >> This yields: >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 6.41041 0.01722 372.21 <2e-16 *** >> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** >> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** >> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** >> --- >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> Exactly the same as in the png. >> >> Peter already mentioned this as a possible reason for the discrepancy: >>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it >>perhaps the case that x1 and x2 have already been scaled to have >>standard deviation 1? In that case, x1*x2 won't be.") >> >> Best, >> Wolfgang
Viechtbauer Wolfgang (SP)
2017-May-05 20:41 UTC
[Rd] lm() gives different results to lm.ridge() and SPSS
Well, one correction -- the 'standardized coefficients' that SPSS shows are based on standardizing all variables separately (so x1, x2, and x1*x2 are all standardized). So with respect to that, the criticism certainly stands. -----Original Message----- From: Viechtbauer Wolfgang (SP) Sent: Friday, May 05, 2017 22:35 To: r-devel at r-project.org Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS Totally agree that standardizing the interaction term is nonsense. But in all fairness, SPSS doesn't do that. In fact, the 'REGRESSION' command in SPSS doesn't compute any interaction terms -- one has to first compute them 'by hand' and then add them to the model like any other variable. So somebody worked extra hard to standardize that interaction term in SPSS as well :/ Best, Wolfgang -----Original Message----- From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Fox, John Sent: Friday, May 05, 2017 20:23 To: Nick Brown; peter dalgaard Cc: r-devel at r-project.org Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS Dear Nick, On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown" <r-devel-bounces at r-project.org on behalf of nick.brown at free.fr> wrote:>>I conjecture that something in the vicinity of >> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >>summary(res) >> would reproduce the SPSS Beta values. > >Yes, that works. Thanks!That you have to work hard in R to match the SPSS results isn?t such a bad thing when you factor in the observation that standardizing the interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense. Best, John ------------------------------------- John Fox, Professor McMaster University Hamilton, Ontario, Canada Web: http://socserv.mcmaster.ca/jfox/>----- Original Message ----- > >From: "peter dalgaard" <pdalgd at gmail.com> >To: "Viechtbauer Wolfgang (SP)" ><wolfgang.viechtbauer at maastrichtuniversity.nl>, "Nick Brown" ><nick.brown at free.fr> >Cc: r-devel at r-project.org >Sent: Friday, 5 May, 2017 3:33:29 PM >Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > >Thanks, I was getting to try this, but got side tracked by actual work... > >Your analysis reproduces the SPSS unscaled estimates. It still remains to >figure out how Nick got > >> >coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) > >(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >0.07342198 -0.39650356 -0.36569488 -0.09435788 > > >which does not match your output. I suspect that ZMEAN_PA and >ZDIVERSITY_PA were scaled for this analysis (but the interaction term >still obviously is not). I conjecture that something in the vicinity of > >res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >summary(res) > >would reproduce the SPSS Beta values. > >> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) >><wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: >> >> I had no problems running regression models in SPSS and R that yielded >>the same results for these data. >> >> The difference you are observing is from fitting different models. In >>R, you fitted: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) >> summary(res) >> >> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This >>is not a standardized variable itself and not the same as "ZINTER_PA_C" >>in the png you showed, which is not a variable in the dataset, but can >>be created with: >> >> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) >> >> If you want the same results as in SPSS, then you need to fit: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, >>data=dat) >> summary(res) >> >> This yields: >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 6.41041 0.01722 372.21 <2e-16 *** >> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** >> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** >> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** >> --- >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> Exactly the same as in the png. >> >> Peter already mentioned this as a possible reason for the discrepancy: >>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it >>perhaps the case that x1 and x2 have already been scaled to have >>standard deviation 1? In that case, x1*x2 won't be.") >> >> Best, >> Wolfgang
Hi John, Thanks for the comment... but that appears to mean that SPSS has a big problem. I have always been told that to include an interaction term in a regression, the only way is to do the multiplication by hand. But then it seems to be impossible to stop SPSS from re-standardizing the variable that corresponds to the interaction term. Am I missing something? Is there a way to perform the regression with the interaction in SPSS without computing the interaction as a separate variable? Best, Nick ----- Original Message ----- From: "John Fox" <jfox at mcmaster.ca> To: "Nick Brown" <nick.brown at free.fr>, "peter dalgaard" <pdalgd at gmail.com> Cc: r-devel at r-project.org Sent: Friday, 5 May, 2017 8:22:53 PM Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS Dear Nick, On 2017-05-05, 9:40 AM, "R-devel on behalf of Nick Brown" <r-devel-bounces at r-project.org on behalf of nick.brown at free.fr> wrote:>>I conjecture that something in the vicinity of >> res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >>scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >>summary(res) >> would reproduce the SPSS Beta values. > >Yes, that works. Thanks!That you have to work hard in R to match the SPSS results isn?t such a bad thing when you factor in the observation that standardizing the interaction regressor, ZMEAN_PA * ZDIVERSITY_PA, separately from each of its components, ZMEAN_PA and ZDIVERSITY_PA, is nonsense. Best, John ------------------------------------- John Fox, Professor McMaster University Hamilton, Ontario, Canada Web: http://socserv.mcmaster.ca/jfox/> > >----- Original Message ----- > >From: "peter dalgaard" <pdalgd at gmail.com> >To: "Viechtbauer Wolfgang (SP)" ><wolfgang.viechtbauer at maastrichtuniversity.nl>, "Nick Brown" ><nick.brown at free.fr> >Cc: r-devel at r-project.org >Sent: Friday, 5 May, 2017 3:33:29 PM >Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS > >Thanks, I was getting to try this, but got side tracked by actual work... > >Your analysis reproduces the SPSS unscaled estimates. It still remains to >figure out how Nick got > >> >coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) > >(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >0.07342198 -0.39650356 -0.36569488 -0.09435788 > > >which does not match your output. I suspect that ZMEAN_PA and >ZDIVERSITY_PA were scaled for this analysis (but the interaction term >still obviously is not). I conjecture that something in the vicinity of > >res <- lm(DEPRESSION ~ scale(ZMEAN_PA) + scale(ZDIVERSITY_PA) + >scale(ZMEAN_PA * ZDIVERSITY_PA), data=dat) >summary(res) > >would reproduce the SPSS Beta values. > > >> On 5 May 2017, at 14:43 , Viechtbauer Wolfgang (SP) >><wolfgang.viechtbauer at maastrichtuniversity.nl> wrote: >> >> I had no problems running regression models in SPSS and R that yielded >>the same results for these data. >> >> The difference you are observing is from fitting different models. In >>R, you fitted: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=dat) >> summary(res) >> >> The interaction term is the product of ZMEAN_PA and ZDIVERSITY_PA. This >>is not a standardized variable itself and not the same as "ZINTER_PA_C" >>in the png you showed, which is not a variable in the dataset, but can >>be created with: >> >> dat$ZINTER_PA_C <- with(dat, scale(ZMEAN_PA * ZDIVERSITY_PA)) >> >> If you want the same results as in SPSS, then you need to fit: >> >> res <- lm(DEPRESSION ~ ZMEAN_PA + ZDIVERSITY_PA + ZINTER_PA_C, >>data=dat) >> summary(res) >> >> This yields: >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 6.41041 0.01722 372.21 <2e-16 *** >> ZMEAN_PA -1.62726 0.04200 -38.74 <2e-16 *** >> ZDIVERSITY_PA -1.50082 0.07447 -20.15 <2e-16 *** >> ZINTER_PA_C -0.58955 0.05288 -11.15 <2e-16 *** >> --- >> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> Exactly the same as in the png. >> >> Peter already mentioned this as a possible reason for the discrepancy: >>https://stat.ethz.ch/pipermail/r-devel/2017-May/074191.html ("Is it >>perhaps the case that x1 and x2 have already been scaled to have >>standard deviation 1? In that case, x1*x2 won't be.") >> >> Best, >> Wolfgang >> >> -----Original Message----- >> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick >>Brown >> Sent: Friday, May 05, 2017 10:40 >> To: peter dalgaard >> Cc: r-devel at r-project.org >> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> Hi, >> >> Here is (I hope) all the relevant output from R. >> >>> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > >>>mean(s1$ZDIVERSITY_PA, na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, >>>na.rm=T) [1] -5.430282e-15 > lm.ridge(ZDEPRESSION ~ ZMEAN_PA * >>>ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA ZDIVERSITY_PA >>>ZMEAN_PA:ZDIVERSITY_PA >> -0.3962254 -0.3636026 -0.1425772 ## This is what I thought was the >>problem originally. :-) >> >> >>> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) >>>(Intercept) ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >> 0.07342198 -0.39650356 -0.36569488 -0.09435788 > >>coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)) >>ZMEAN_PA ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA >> 0.07342198 -0.39650356 -0.36569488 -0.09435788 The equivalent from SPSS >>is attached. The unstandardized coefficients in SPSS look nothing like >>those in R. The standardized coefficients in SPSS match the >>lm.ridge()$coef numbers very closely indeed, suggesting that the same >>algorithm may be in use. >> >> I have put the dataset file, which is the untouched original I received >>from the authors, in this Dropbox folder: >>https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0 >>. You can read it into R with this code (one variable needs to be >>standardized and centered; everything else is already in the file): >> >> s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE) >>s1$ZDEPRESSION <- scale(s1$DEPRESSION) >> Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm >>sure IBM will want to fix it quickly (ha ha ha). >> >> Nick >> >> ----- Original Message ----- >> >> From: "peter dalgaard" <pdalgd at gmail.com> >> To: "Nick Brown" <nick.brown at free.fr> >> Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at r-project.org >> Sent: Friday, 5 May, 2017 10:02:10 AM >> Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS >> >> I asked you before, but in case you missed it: Are you looking at the >>right place in SPSS output? >> >> The UNstandardized coefficients should be comparable to R, i.e. the "B" >>column, not "Beta". >> >> -pd >> >>> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote: >>> >>> Hi Simon, >>> >>> Yes, if I uses coefficients() I get the same results for lm() and >>>lm.ridge(). So that's consistent, at least. >>> >>> Interestingly, the "wrong" number I get from lm.ridge()$coef agrees >>>with the value from SPSS to 5dp, which is an interesting coincidence if >>>these numbers have no particular external meaning in lm.ridge(). >>> >>> Kind regards, >>> Nick >>> >>> ----- Original Message ----- >>> >>> From: "Simon Bonner" <sbonner6 at uwo.ca> >>> To: "Nick Brown" <nick.brown at free.fr>, r-devel at r-project.org >>> Sent: Thursday, 4 May, 2017 7:07:33 PM >>> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS >>> >>> Hi Nick, >>> >>> I think that the problem here is your use of $coef to extract the >>>coefficients of the ridge regression. The help for lm.ridge states that >>>coef is a "matrix of coefficients, one row for each value of lambda. >>>Note that these are not on the original scale and are for use by the >>>coef method." >>> >>> I ran a small test with simulated data, code is copied below, and >>>indeed the output from lm.ridge differs depending on whether the >>>coefficients are accessed via $coef or via the coefficients() function. >>>The latter does produce results that match the output from lm. >>> >>> I hope that helps. >>> >>> Cheers, >>> >>> Simon >>> >>> ## Load packages >>> library(MASS) >>> >>> ## Set seed >>> set.seed(8888) >>> >>> ## Set parameters >>> n <- 100 >>> beta <- c(1,0,1) >>> sigma <- .5 >>> rho <- .75 >>> >>> ## Simulate correlated covariates >>> Sigma <- matrix(c(1,rho,rho,1),ncol=2) >>> X <- mvrnorm(n,c(0,0),Sigma=Sigma) >>> >>> ## Simulate data >>> mu <- beta[1] + X %*% beta[-1] >>> y <- rnorm(n,mu,sigma) >>> >>> ## Fit model with lm() >>> fit1 <- lm(y ~ X) >>> >>> ## Fit model with lm.ridge() >>> fit2 <- lm.ridge(y ~ X) >>> >>> ## Compare coefficients >>> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) >>> >>> [,1] [,2] [,3] >>> (Intercept) 0.99276001 NA 0.99276001 >>> X1 -0.03980772 -0.04282391 -0.03980772 >>> X2 1.11167179 1.06200476 1.11167179 >>> >>> -- >>> >>> Simon Bonner >>> Assistant Professor of Environmetrics/ Director MMASc >>> Department of Statistical and Actuarial Sciences/Department of Biology >>> University of Western Ontario >>> >>> Office: Western Science Centre rm 276 >>> >>> Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: >>>519-661-3813 >>> Twitter: @bonnerstatslab | Website: >>>http://simon.bonners.ca/bonner-lab/wpblog/ >>> >>>> -----Original Message----- >>>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of >>>>Nick >>>> Brown >>>> Sent: May 4, 2017 10:29 AM >>>> To: r-devel at r-project.org >>>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS >>>> >>>> Hallo, >>>> >>>> I hope I am posting to the right place. I was advised to try this >>>>list by Ben Bolker >>>> (https://twitter.com/bolkerb/status/859909918446497795). I also >>>>posted this >>>> question to StackOverflow >>>> >>>>(http://stackoverflow.com/questions/43771269/lm-gives-different-results >>>>- >>>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my >>>>first >>>> program in 1975 and have been paid to program in about 15 different >>>> languages, so I have some general background knowledge. >>>> >>>> I have a regression from which I extract the coefficients like this: >>>> lm(y ~ x1 * x2, data=ds)$coef >>>> That gives: x1=0.40, x2=0.37, x1*x2=0.09 >>>> >>>> When I do the same regression in SPSS, I get: >>>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. >>>> So the main effects are in agreement, but there is quite a difference >>>>in the >>>> coefficient for the interaction. >>>> >>>> X1 and X2 are correlated about .75 (yes, yes, I know - this model >>>>wasn't my >>>> idea, but it got published), so there is quite possibly something >>>>going on with >>>> collinearity. So I thought I'd try lm.ridge() to see if I can get an >>>>idea of where >>>> the problems are occurring. >>>> >>>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge >>>>penalty) and >>>> check we get the same results as with lm(): >>>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef >>>> x1=0.40, x2=0.37, x1*x2=0.14 >>>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, >>>>lambda=0 is the >>>> default, so it can be omitted; I can alternate between including or >>>>deleting >>>> ".ridge" in the function call, and watch the coefficient for the >>>>interaction >>>> change.) >>>> >>>> What seems slightly strange to me here is that I assumed that >>>>lm.ridge() just >>>> piggybacks on lm() anyway, so in the specific case where lambda=0 and >>>>there >>>> is no "ridging" to do, I'd expect exactly the same results. >>>> >>>> Unfortunately there are 34,000 cases in the dataset, so a "minimal" >>>>reprex will >>>> not be easy to make, but I can share the data via Dropbox or >>>>something if that >>>> would help. >>>> >>>> I appreciate that when there is strong collinearity then all bets are >>>>off in terms >>>> of what the betas mean, but I would really expect lm() and lm.ridge() >>>>to give >>>> the same results. (I would be happy to ignore SPSS, but for the >>>>moment it's >>>> part of the majority!) >>>> >>>> Thanks for reading, >>>> Nick >> ______________________________________________ >> R-devel at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > >-- >Peter Dalgaard, Professor, >Center for Statistics, Copenhagen Business School >Solbjerg Plads 3, 2000 Frederiksberg, Denmark >Phone: (+45)38153501 >Office: A 4.23 >Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com > > > > > > > > > > > > [[alternative HTML version deleted]] > >______________________________________________ >R-devel at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-devel[[alternative HTML version deleted]]