thr3ads.net - R devel - [Rd] lm() gives different results to lm.ridge() and SPSS [May 2017]

If this information is useful, please help other people find it:
Share via:

Nick Brown

2017-May-04 23:58 UTC

[Rd] lm() gives different results to lm.ridge() and SPSS

Hi Simon, 

Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). So
that's consistent, at least.

Interestingly, the "wrong" number I get from lm.ridge()$coef agrees
with the value from SPSS to 5dp, which is an interesting coincidence if these
numbers have no particular external meaning in lm.ridge().

Kind regards, 
Nick 

----- Original Message -----

From: "Simon Bonner" <sbonner6 at uwo.ca> 
To: "Nick Brown" <nick.brown at free.fr>, r-devel at
r-project.org
Sent: Thursday, 4 May, 2017 7:07:33 PM 
Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS 

Hi Nick, 

I think that the problem here is your use of $coef to extract the coefficients
of the ridge regression. The help for lm.ridge states that coef is a
"matrix of coefficients, one row for each value of lambda. Note that these
are not on the original scale and are for use by the coef method."

I ran a small test with simulated data, code is copied below, and indeed the
output from lm.ridge differs depending on whether the coefficients are accessed
via $coef or via the coefficients() function. The latter does produce results
that match the output from lm.

I hope that helps. 

Cheers, 

Simon 

## Load packages 
library(MASS) 

## Set seed 
set.seed(8888) 

## Set parameters 
n <- 100 
beta <- c(1,0,1) 
sigma <- .5 
rho <- .75 

## Simulate correlated covariates 
Sigma <- matrix(c(1,rho,rho,1),ncol=2) 
X <- mvrnorm(n,c(0,0),Sigma=Sigma) 

## Simulate data 
mu <- beta[1] + X %*% beta[-1] 
y <- rnorm(n,mu,sigma) 

## Fit model with lm() 
fit1 <- lm(y ~ X) 

## Fit model with lm.ridge() 
fit2 <- lm.ridge(y ~ X) 

## Compare coefficients 
cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) 

[,1] [,2] [,3] 
(Intercept) 0.99276001 NA 0.99276001 
X1 -0.03980772 -0.04282391 -0.03980772 
X2 1.11167179 1.06200476 1.11167179 

-- 

Simon Bonner 
Assistant Professor of Environmetrics/ Director MMASc 
Department of Statistical and Actuarial Sciences/Department of Biology 
University of Western Ontario 

Office: Western Science Centre rm 276 

Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 
Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/ 
> -----Original Message----- 
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick 
> Brown 
> Sent: May 4, 2017 10:29 AM 
> To: r-devel at r-project.org 
> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> Hallo, 
> 
> I hope I am posting to the right place. I was advised to try this list by
Ben Bolker
> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this
> question to StackOverflow 
> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- 
> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first
> program in 1975 and have been paid to program in about 15 different 
> languages, so I have some general background knowledge. 
> 
> 
> I have a regression from which I extract the coefficients like this: 
> lm(y ~ x1 * x2, data=ds)$coef 
> That gives: x1=0.40, x2=0.37, x1*x2=0.09 
> 
> 
> 
> When I do the same regression in SPSS, I get: 
> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. 
> So the main effects are in agreement, but there is quite a difference in
the
> coefficient for the interaction. 
> 
> 
> X1 and X2 are correlated about .75 (yes, yes, I know - this model
wasn't my
> idea, but it got published), so there is quite possibly something going on
with
> collinearity. So I thought I'd try lm.ridge() to see if I can get an
idea of where
> the problems are occurring. 
> 
> 
> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
penalty) and
> check we get the same results as with lm(): 
> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef 
> x1=0.40, x2=0.37, x1*x2=0.14 
> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is
the
> default, so it can be omitted; I can alternate between including or
deleting
> ".ridge" in the function call, and watch the coefficient for the
interaction
> change.) 
> 
> 
> 
> What seems slightly strange to me here is that I assumed that lm.ridge()
just
> piggybacks on lm() anyway, so in the specific case where lambda=0 and there
> is no "ridging" to do, I'd expect exactly the same results. 
> 
> 
> Unfortunately there are 34,000 cases in the dataset, so a
"minimal" reprex will
> not be easy to make, but I can share the data via Dropbox or something if
that
> would help. 
> 
> 
> 
> I appreciate that when there is strong collinearity then all bets are off
in terms
> of what the betas mean, but I would really expect lm() and lm.ridge() to
give
> the same results. (I would be happy to ignore SPSS, but for the moment
it's
> part of the majority!) 
> 
> 
> 
> Thanks for reading, 
> Nick 
> 
> 
> [[alternative HTML version deleted]] 
> 
> ______________________________________________ 
> R-devel at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-devel 

	[[alternative HTML version deleted]]

peter dalgaard

2017-May-05 08:02 UTC

head link

[Rd] lm() gives different results to lm.ridge() and SPSS

I asked you before, but in case you missed it: Are you looking at the right
place in SPSS output?

The UNstandardized coefficients should be comparable to R, i.e. the
"B" column, not "Beta".

-pd
> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote:
> 
> Hi Simon, 
> 
> Yes, if I uses coefficients() I get the same results for lm() and
lm.ridge(). So that's consistent, at least.
> 
> Interestingly, the "wrong" number I get from lm.ridge()$coef
agrees with the value from SPSS to 5dp, which is an interesting coincidence if
these numbers have no particular external meaning in lm.ridge().
> 
> Kind regards, 
> Nick 
> 
> ----- Original Message -----
> 
> From: "Simon Bonner" <sbonner6 at uwo.ca> 
> To: "Nick Brown" <nick.brown at free.fr>, r-devel at
r-project.org
> Sent: Thursday, 4 May, 2017 7:07:33 PM 
> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> Hi Nick, 
> 
> I think that the problem here is your use of $coef to extract the
coefficients of the ridge regression. The help for lm.ridge states that coef is
a "matrix of coefficients, one row for each value of lambda. Note that
these are not on the original scale and are for use by the coef method."
> 
> I ran a small test with simulated data, code is copied below, and indeed
the output from lm.ridge differs depending on whether the coefficients are
accessed via $coef or via the coefficients() function. The latter does produce
results that match the output from lm.
> 
> I hope that helps. 
> 
> Cheers, 
> 
> Simon 
> 
> ## Load packages 
> library(MASS) 
> 
> ## Set seed 
> set.seed(8888) 
> 
> ## Set parameters 
> n <- 100 
> beta <- c(1,0,1) 
> sigma <- .5 
> rho <- .75 
> 
> ## Simulate correlated covariates 
> Sigma <- matrix(c(1,rho,rho,1),ncol=2) 
> X <- mvrnorm(n,c(0,0),Sigma=Sigma) 
> 
> ## Simulate data 
> mu <- beta[1] + X %*% beta[-1] 
> y <- rnorm(n,mu,sigma) 
> 
> ## Fit model with lm() 
> fit1 <- lm(y ~ X) 
> 
> ## Fit model with lm.ridge() 
> fit2 <- lm.ridge(y ~ X) 
> 
> ## Compare coefficients 
> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) 
> 
> [,1] [,2] [,3] 
> (Intercept) 0.99276001 NA 0.99276001 
> X1 -0.03980772 -0.04282391 -0.03980772 
> X2 1.11167179 1.06200476 1.11167179 
> 
> -- 
> 
> Simon Bonner 
> Assistant Professor of Environmetrics/ Director MMASc 
> Department of Statistical and Actuarial Sciences/Department of Biology 
> University of Western Ontario 
> 
> Office: Western Science Centre rm 276 
> 
> Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax:
519-661-3813
> Twitter: @bonnerstatslab | Website:
http://simon.bonners.ca/bonner-lab/wpblog/
> 
>> -----Original Message----- 
>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of
Nick
>> Brown 
>> Sent: May 4, 2017 10:29 AM 
>> To: r-devel at r-project.org 
>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS 
>> 
>> Hallo, 
>> 
>> I hope I am posting to the right place. I was advised to try this list
by Ben Bolker
>> (https://twitter.com/bolkerb/status/859909918446497795). I also posted
this
>> question to StackOverflow 
>>
(http://stackoverflow.com/questions/43771269/lm-gives-different-results-
>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my
first
>> program in 1975 and have been paid to program in about 15 different 
>> languages, so I have some general background knowledge. 
>> 
>> 
>> I have a regression from which I extract the coefficients like this: 
>> lm(y ~ x1 * x2, data=ds)$coef 
>> That gives: x1=0.40, x2=0.37, x1*x2=0.09 
>> 
>> 
>> 
>> When I do the same regression in SPSS, I get: 
>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. 
>> So the main effects are in agreement, but there is quite a difference
in the
>> coefficient for the interaction. 
>> 
>> 
>> X1 and X2 are correlated about .75 (yes, yes, I know - this model
wasn't my
>> idea, but it got published), so there is quite possibly something going
on with
>> collinearity. So I thought I'd try lm.ridge() to see if I can get
an idea of where
>> the problems are occurring. 
>> 
>> 
>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
penalty) and
>> check we get the same results as with lm(): 
>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef 
>> x1=0.40, x2=0.37, x1*x2=0.14 
>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0
is the
>> default, so it can be omitted; I can alternate between including or
deleting
>> ".ridge" in the function call, and watch the coefficient for
the interaction
>> change.) 
>> 
>> 
>> 
>> What seems slightly strange to me here is that I assumed that
lm.ridge() just
>> piggybacks on lm() anyway, so in the specific case where lambda=0 and
there
>> is no "ridging" to do, I'd expect exactly the same
results.
>> 
>> 
>> Unfortunately there are 34,000 cases in the dataset, so a
"minimal" reprex will
>> not be easy to make, but I can share the data via Dropbox or something
if that
>> would help. 
>> 
>> 
>> 
>> I appreciate that when there is strong collinearity then all bets are
off in terms
>> of what the betas mean, but I would really expect lm() and lm.ridge()
to give
>> the same results. (I would be happy to ignore SPSS, but for the moment
it's
>> part of the majority!) 
>> 
>> 
>> 
>> Thanks for reading, 
>> Nick 
>> 
>> 
>> [[alternative HTML version deleted]] 
>> 
>> ______________________________________________ 
>> R-devel at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-devel 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Nick Brown

2017-May-05 08:40 UTC

head link

[Rd] lm() gives different results to lm.ridge() and SPSS

Hi, 

Here is (I hope) all the relevant output from R. 
> mean(s1$ZDEPRESSION, na.rm=T) [1] -1.041546e-16 > mean(s1$ZDIVERSITY_PA,
na.rm=T) [1] -9.660583e-16 > mean(s1$ZMEAN_PA, na.rm=T) [1] -5.430282e-15
> lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1)$coef ZMEAN_PA    
ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA            -0.3962254             -0.3636026             -0.1425772      ##
This is what I thought was the problem originally. :-)

> coefficients(lm(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA, data=s1))
(Intercept)               ZMEAN_PA          ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA            0.07342198            -0.39650356            -0.36569488           
-0.09435788 > coefficients(lm.ridge(ZDEPRESSION ~ ZMEAN_PA * ZDIVERSITY_PA,
data=s1)) ZMEAN_PA          ZDIVERSITY_PA ZMEAN_PA:ZDIVERSITY_PA
            0.07342198            -0.39650356            -0.36569488           
-0.09435788 The equivalent from SPSS is attached. The unstandardized
coefficients in SPSS look nothing like those in R. The standardized coefficients
in SPSS match the lm.ridge()$coef numbers very closely indeed, suggesting that
the same algorithm may be in use.

I have put the dataset file, which is the untouched original I received from the
authors, in this Dropbox folder:
https://www.dropbox.com/sh/xsebjy55ius1ysb/AADwYUyV1bl6-iAw7ACuF1_La?dl=0. You
can read it into R with this code (one variable needs to be standardized and
centered; everything else is already in the file):

s1 <- read.csv("Emodiversity_Study1.csv", stringsAsFactors=FALSE)
s1$ZDEPRESSION <- scale(s1$DEPRESSION)
Hey, maybe R is fine and I've stumbled on a bug in SPSS? If so, I'm sure
IBM will want to fix it quickly (ha ha ha).

Nick 



----- Original Message -----

From: "peter dalgaard" <pdalgd at gmail.com> 
To: "Nick Brown" <nick.brown at free.fr> 
Cc: "Simon Bonner" <sbonner6 at uwo.ca>, r-devel at
r-project.org
Sent: Friday, 5 May, 2017 10:02:10 AM 
Subject: Re: [Rd] lm() gives different results to lm.ridge() and SPSS 

I asked you before, but in case you missed it: Are you looking at the right
place in SPSS output?

The UNstandardized coefficients should be comparable to R, i.e. the
"B" column, not "Beta".

-pd 
> On 5 May 2017, at 01:58 , Nick Brown <nick.brown at free.fr> wrote: 
> 
> Hi Simon, 
> 
> Yes, if I uses coefficients() I get the same results for lm() and
lm.ridge(). So that's consistent, at least.
> 
> Interestingly, the "wrong" number I get from lm.ridge()$coef
agrees with the value from SPSS to 5dp, which is an interesting coincidence if
these numbers have no particular external meaning in lm.ridge().
> 
> Kind regards, 
> Nick 
> 
> ----- Original Message ----- 
> 
> From: "Simon Bonner" <sbonner6 at uwo.ca> 
> To: "Nick Brown" <nick.brown at free.fr>, r-devel at
r-project.org
> Sent: Thursday, 4 May, 2017 7:07:33 PM 
> Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> Hi Nick, 
> 
> I think that the problem here is your use of $coef to extract the
coefficients of the ridge regression. The help for lm.ridge states that coef is
a "matrix of coefficients, one row for each value of lambda. Note that
these are not on the original scale and are for use by the coef method."
> 
> I ran a small test with simulated data, code is copied below, and indeed
the output from lm.ridge differs depending on whether the coefficients are
accessed via $coef or via the coefficients() function. The latter does produce
results that match the output from lm.
> 
> I hope that helps. 
> 
> Cheers, 
> 
> Simon 
> 
> ## Load packages 
> library(MASS) 
> 
> ## Set seed 
> set.seed(8888) 
> 
> ## Set parameters 
> n <- 100 
> beta <- c(1,0,1) 
> sigma <- .5 
> rho <- .75 
> 
> ## Simulate correlated covariates 
> Sigma <- matrix(c(1,rho,rho,1),ncol=2) 
> X <- mvrnorm(n,c(0,0),Sigma=Sigma) 
> 
> ## Simulate data 
> mu <- beta[1] + X %*% beta[-1] 
> y <- rnorm(n,mu,sigma) 
> 
> ## Fit model with lm() 
> fit1 <- lm(y ~ X) 
> 
> ## Fit model with lm.ridge() 
> fit2 <- lm.ridge(y ~ X) 
> 
> ## Compare coefficients 
> cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) 
> 
> [,1] [,2] [,3] 
> (Intercept) 0.99276001 NA 0.99276001 
> X1 -0.03980772 -0.04282391 -0.03980772 
> X2 1.11167179 1.06200476 1.11167179 
> 
> -- 
> 
> Simon Bonner 
> Assistant Professor of Environmetrics/ Director MMASc 
> Department of Statistical and Actuarial Sciences/Department of Biology 
> University of Western Ontario 
> 
> Office: Western Science Centre rm 276 
> 
> Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax:
519-661-3813
> Twitter: @bonnerstatslab | Website:
http://simon.bonners.ca/bonner-lab/wpblog/
> 
>> -----Original Message----- 
>> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of
Nick
>> Brown 
>> Sent: May 4, 2017 10:29 AM 
>> To: r-devel at r-project.org 
>> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS 
>> 
>> Hallo, 
>> 
>> I hope I am posting to the right place. I was advised to try this list
by Ben Bolker
>> (https://twitter.com/bolkerb/status/859909918446497795). I also posted
this
>> question to StackOverflow 
>>
(http://stackoverflow.com/questions/43771269/lm-gives-different-results-
>> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my
first
>> program in 1975 and have been paid to program in about 15 different 
>> languages, so I have some general background knowledge. 
>> 
>> 
>> I have a regression from which I extract the coefficients like this: 
>> lm(y ~ x1 * x2, data=ds)$coef 
>> That gives: x1=0.40, x2=0.37, x1*x2=0.09 
>> 
>> 
>> 
>> When I do the same regression in SPSS, I get: 
>> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. 
>> So the main effects are in agreement, but there is quite a difference
in the
>> coefficient for the interaction. 
>> 
>> 
>> X1 and X2 are correlated about .75 (yes, yes, I know - this model
wasn't my
>> idea, but it got published), so there is quite possibly something going
on with
>> collinearity. So I thought I'd try lm.ridge() to see if I can get
an idea of where
>> the problems are occurring. 
>> 
>> 
>> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
penalty) and
>> check we get the same results as with lm(): 
>> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef 
>> x1=0.40, x2=0.37, x1*x2=0.14 
>> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0
is the
>> default, so it can be omitted; I can alternate between including or
deleting
>> ".ridge" in the function call, and watch the coefficient for
the interaction
>> change.) 
>> 
>> 
>> 
>> What seems slightly strange to me here is that I assumed that
lm.ridge() just
>> piggybacks on lm() anyway, so in the specific case where lambda=0 and
there
>> is no "ridging" to do, I'd expect exactly the same
results.
>> 
>> 
>> Unfortunately there are 34,000 cases in the dataset, so a
"minimal" reprex will
>> not be easy to make, but I can share the data via Dropbox or something
if that
>> would help. 
>> 
>> 
>> 
>> I appreciate that when there is strong collinearity then all bets are
off in terms
>> of what the betas mean, but I would really expect lm() and lm.ridge()
to give
>> the same results. (I would be happy to ignore SPSS, but for the moment
it's
>> part of the majority!) 
>> 
>> 
>> 
>> Thanks for reading, 
>> Nick 
>> 
>> 
>> [[alternative HTML version deleted]] 
>> 
>> ______________________________________________ 
>> R-devel at r-project.org mailing list 
>> https://stat.ethz.ch/mailman/listinfo/r-devel 
> 
> 
> [[alternative HTML version deleted]] 
> 
> ______________________________________________ 
> R-devel at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-devel 
-- 
Peter Dalgaard, Professor, 
Center for Statistics, Copenhagen Business School 
Solbjerg Plads 3, 2000 Frederiksberg, Denmark 
Phone: (+45)38153501 
Office: A 4.23 
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com 










-------------- next part --------------
A non-text attachment was scrubbed...
Name: SPSS.png
Type: image/png
Size: 38704 bytes
Desc: not available
URL:
<https://stat.ethz.ch/pipermail/r-devel/attachments/20170505/24adf071/attachment.png>

Maybe Matching Threads

Search for more reasonably related threads

R devel - May 2017 - lm() gives different results to lm.ridge() and SPSS

[Rd] lm() gives different results to lm.ridge() and SPSS

[Rd] lm() gives different results to lm.ridge() and SPSS

[Rd] lm() gives different results to lm.ridge() and SPSS

Maybe Matching Threads