thr3ads.net - R devel - [Rd] lm() gives different results to lm.ridge() and SPSS [May 2017]

If this information is useful, please help other people find it:
Share via:

Nick Brown

2017-May-04 14:28 UTC

[Rd] lm() gives different results to lm.ridge() and SPSS

Hallo, 

I hope I am posting to the right place. I was advised to try this list by Ben
Bolker (https://twitter.com/bolkerb/status/859909918446497795). I also posted
this question to StackOverflow
(http://stackoverflow.com/questions/43771269/lm-gives-different-results-from-lm-ridgelambda-0).
I am a relative newcomer to R, but I wrote my first program in 1975 and have
been paid to program in about 15 different languages, so I have some general
background knowledge.


I have a regression from which I extract the coefficients like this: 
lm(y ~ x1 * x2, data=ds)$coef 
That gives: x1=0.40, x2=0.37, x1*x2=0.09 



When I do the same regression in SPSS, I get: 
beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. 
So the main effects are in agreement, but there is quite a difference in the
coefficient for the interaction.


X1 and X2 are correlated about .75 (yes, yes, I know - this model wasn't my
idea, but it got published), so there is quite possibly something going on with
collinearity. So I thought I'd try lm.ridge() to see if I can get an idea of
where the problems are occurring.


The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge penalty)
and check we get the same results as with lm():
lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef 
x1=0.40, x2=0.37, x1*x2=0.14 
So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is the
default, so it can be omitted; I can alternate between including or deleting
".ridge" in the function call, and watch the coefficient for the
interaction change.)



What seems slightly strange to me here is that I assumed that lm.ridge() just
piggybacks on lm() anyway, so in the specific case where lambda=0 and there is
no "ridging" to do, I'd expect exactly the same results.


Unfortunately there are 34,000 cases in the dataset, so a "minimal"
reprex will not be easy to make, but I can share the data via Dropbox or
something if that would help.



I appreciate that when there is strong collinearity then all bets are off in
terms of what the betas mean, but I would really expect lm() and lm.ridge() to
give the same results. (I would be happy to ignore SPSS, but for the moment
it's part of the majority!)



Thanks for reading, 
Nick 


	[[alternative HTML version deleted]]

Simon Bonner

2017-May-04 17:07 UTC

head link

[Rd] lm() gives different results to lm.ridge() and SPSS

Hi Nick,

I think that the problem here is your use of $coef to extract the coefficients
of the ridge regression. The help for lm.ridge states that coef is a
"matrix of coefficients, one row for each value of lambda. Note that these
are not on the original scale and are for use by the coef method."

I ran a small test with simulated data, code is copied below, and indeed the
output from lm.ridge differs depending on whether the coefficients are accessed
via $coef or via the coefficients() function. The latter does produce results
that match the output from lm.

I hope that helps.

Cheers,

Simon

## Load packages
library(MASS)

## Set seed 
set.seed(8888)

## Set parameters
n <- 100
beta <- c(1,0,1)
sigma <- .5
rho <- .75

## Simulate correlated covariates
Sigma <- matrix(c(1,rho,rho,1),ncol=2)
X <- mvrnorm(n,c(0,0),Sigma=Sigma)

## Simulate data
mu <- beta[1] + X %*% beta[-1]
y <- rnorm(n,mu,sigma)

## Fit model with lm()
fit1 <- lm(y ~ X)

## Fit model with lm.ridge()
fit2 <- lm.ridge(y ~ X)

## Compare coefficients
cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2))

                   [,1]        [,2]        [,3]
(Intercept)  0.99276001          NA  0.99276001
X1          -0.03980772 -0.04282391 -0.03980772
X2           1.11167179  1.06200476  1.11167179

--

Simon Bonner
Assistant Professor of Environmetrics/ Director MMASc 
Department of Statistical and Actuarial Sciences/Department of Biology 
University of Western Ontario

Office: Western Science Centre rm 276

Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813
Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/
> -----Original Message-----
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick
> Brown
> Sent: May 4, 2017 10:29 AM
> To: r-devel at r-project.org
> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS
> 
> Hallo,
> 
> I hope I am posting to the right place. I was advised to try this list by
Ben Bolker
> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this
> question to StackOverflow
> (http://stackoverflow.com/questions/43771269/lm-gives-different-results-
> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first
> program in 1975 and have been paid to program in about 15 different
> languages, so I have some general background knowledge.
> 
> 
> I have a regression from which I extract the coefficients like this:
> lm(y ~ x1 * x2, data=ds)$coef
> That gives: x1=0.40, x2=0.37, x1*x2=0.09
> 
> 
> 
> When I do the same regression in SPSS, I get:
> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14.
> So the main effects are in agreement, but there is quite a difference in
the
> coefficient for the interaction.
> 
> 
> X1 and X2 are correlated about .75 (yes, yes, I know - this model
wasn't my
> idea, but it got published), so there is quite possibly something going on
with
> collinearity. So I thought I'd try lm.ridge() to see if I can get an
idea of where
> the problems are occurring.
> 
> 
> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
penalty) and
> check we get the same results as with lm():
> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef
> x1=0.40, x2=0.37, x1*x2=0.14
> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is
the
> default, so it can be omitted; I can alternate between including or
deleting
> ".ridge" in the function call, and watch the coefficient for the
interaction
> change.)
> 
> 
> 
> What seems slightly strange to me here is that I assumed that lm.ridge()
just
> piggybacks on lm() anyway, so in the specific case where lambda=0 and there
> is no "ridging" to do, I'd expect exactly the same results.
> 
> 
> Unfortunately there are 34,000 cases in the dataset, so a
"minimal" reprex will
> not be easy to make, but I can share the data via Dropbox or something if
that
> would help.
> 
> 
> 
> I appreciate that when there is strong collinearity then all bets are off
in terms
> of what the betas mean, but I would really expect lm() and lm.ridge() to
give
> the same results. (I would be happy to ignore SPSS, but for the moment
it's
> part of the majority!)
> 
> 
> 
> Thanks for reading,
> Nick
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

Duncan Murdoch

2017-May-04 17:15 UTC

head link

[Rd] lm() gives different results to lm.ridge() and SPSS

On 04/05/2017 10:28 AM, Nick Brown wrote:> Hallo,
>
> I hope I am posting to the right place. I was advised to try this list by
Ben Bolker (https://twitter.com/bolkerb/status/859909918446497795). I also
posted this question to StackOverflow
(http://stackoverflow.com/questions/43771269/lm-gives-different-results-from-lm-ridgelambda-0).
I am a relative newcomer to R, but I wrote my first program in 1975 and have
been paid to program in about 15 different languages, so I have some general
background knowledge.
>
>
> I have a regression from which I extract the coefficients like this:
> lm(y ~ x1 * x2, data=ds)$coef
> That gives: x1=0.40, x2=0.37, x1*x2=0.09
>
>
>
> When I do the same regression in SPSS, I get:
> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14.
> So the main effects are in agreement, but there is quite a difference in
the coefficient for the interaction.
I don't know about this instance, but a common cause of this sort of 
difference is a different parametrization.  If that's the case, then 
predictions in the two systems would match, even if coefficients don't.

Duncan Murdoch
>
>
> X1 and X2 are correlated about .75 (yes, yes, I know - this model
wasn't my idea, but it got published), so there is quite possibly something
going on with collinearity. So I thought I'd try lm.ridge() to see if I can
get an idea of where the problems are occurring.
>
>
> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
penalty) and check we get the same results as with lm():
> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef
> x1=0.40, x2=0.37, x1*x2=0.14
> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is
the default, so it can be omitted; I can alternate between including or deleting
".ridge" in the function call, and watch the coefficient for the
interaction change.)
>
>
>
> What seems slightly strange to me here is that I assumed that lm.ridge()
just piggybacks on lm() anyway, so in the specific case where lambda=0 and there
is no "ridging" to do, I'd expect exactly the same results.
>
>
> Unfortunately there are 34,000 cases in the dataset, so a
"minimal" reprex will not be easy to make, but I can share the data
via Dropbox or something if that would help.
>
>
>
> I appreciate that when there is strong collinearity then all bets are off
in terms of what the betas mean, but I would really expect lm() and lm.ridge()
to give the same results. (I would be happy to ignore SPSS, but for the moment
it's part of the majority!)
>
>
>
> Thanks for reading,
> Nick
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

peter dalgaard

2017-May-04 17:21 UTC

head link

[Rd] lm() gives different results to lm.ridge() and SPSS

Um, the link to StackOverflow does not seem to contain the same question. It
does contain a stern warning not to use the $coef component of lm.ridge...

Is it perhaps the case that x1 and x2 have already been scaled to have standard
deviation 1? In that case, x1*x2 won't be.

Also notice that SPSS tends to use "Beta" for standardized regression
coefficients, and (AFAIR) "b" for the regular ones.

-pd
> On 4 May 2017, at 16:28 , Nick Brown <nick.brown at free.fr> wrote:
> 
> Hallo, 
> 
> I hope I am posting to the right place. I was advised to try this list by
Ben Bolker (https://twitter.com/bolkerb/status/859909918446497795). I also
posted this question to StackOverflow
(http://stackoverflow.com/questions/43771269/lm-gives-different-results-from-lm-ridgelambda-0).
I am a relative newcomer to R, but I wrote my first program in 1975 and have
been paid to program in about 15 different languages, so I have some general
background knowledge.
> 
> 
> I have a regression from which I extract the coefficients like this: 
> lm(y ~ x1 * x2, data=ds)$coef 
> That gives: x1=0.40, x2=0.37, x1*x2=0.09 
> 
> 
> 
> When I do the same regression in SPSS, I get: 
> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. 
> So the main effects are in agreement, but there is quite a difference in
the coefficient for the interaction.
> 
> 
> X1 and X2 are correlated about .75 (yes, yes, I know - this model
wasn't my idea, but it got published), so there is quite possibly something
going on with collinearity. So I thought I'd try lm.ridge() to see if I can
get an idea of where the problems are occurring.
> 
> 
> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
penalty) and check we get the same results as with lm():
> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef 
> x1=0.40, x2=0.37, x1*x2=0.14 
> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is
the default, so it can be omitted; I can alternate between including or deleting
".ridge" in the function call, and watch the coefficient for the
interaction change.)
> 
> 
> 
> What seems slightly strange to me here is that I assumed that lm.ridge()
just piggybacks on lm() anyway, so in the specific case where lambda=0 and there
is no "ridging" to do, I'd expect exactly the same results.
> 
> 
> Unfortunately there are 34,000 cases in the dataset, so a
"minimal" reprex will not be easy to make, but I can share the data
via Dropbox or something if that would help.
> 
> 
> 
> I appreciate that when there is strong collinearity then all bets are off
in terms of what the betas mean, but I would really expect lm() and lm.ridge()
to give the same results. (I would be happy to ignore SPSS, but for the moment
it's part of the majority!)
> 
> 
> 
> Thanks for reading, 
> Nick 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Office: A 4.23
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com

Nick Brown

2017-May-04 23:58 UTC

head link

[Rd] lm() gives different results to lm.ridge() and SPSS

Hi Simon, 

Yes, if I uses coefficients() I get the same results for lm() and lm.ridge(). So
that's consistent, at least.

Interestingly, the "wrong" number I get from lm.ridge()$coef agrees
with the value from SPSS to 5dp, which is an interesting coincidence if these
numbers have no particular external meaning in lm.ridge().

Kind regards, 
Nick 

----- Original Message -----

From: "Simon Bonner" <sbonner6 at uwo.ca> 
To: "Nick Brown" <nick.brown at free.fr>, r-devel at
r-project.org
Sent: Thursday, 4 May, 2017 7:07:33 PM 
Subject: RE: [Rd] lm() gives different results to lm.ridge() and SPSS 

Hi Nick, 

I think that the problem here is your use of $coef to extract the coefficients
of the ridge regression. The help for lm.ridge states that coef is a
"matrix of coefficients, one row for each value of lambda. Note that these
are not on the original scale and are for use by the coef method."

I ran a small test with simulated data, code is copied below, and indeed the
output from lm.ridge differs depending on whether the coefficients are accessed
via $coef or via the coefficients() function. The latter does produce results
that match the output from lm.

I hope that helps. 

Cheers, 

Simon 

## Load packages 
library(MASS) 

## Set seed 
set.seed(8888) 

## Set parameters 
n <- 100 
beta <- c(1,0,1) 
sigma <- .5 
rho <- .75 

## Simulate correlated covariates 
Sigma <- matrix(c(1,rho,rho,1),ncol=2) 
X <- mvrnorm(n,c(0,0),Sigma=Sigma) 

## Simulate data 
mu <- beta[1] + X %*% beta[-1] 
y <- rnorm(n,mu,sigma) 

## Fit model with lm() 
fit1 <- lm(y ~ X) 

## Fit model with lm.ridge() 
fit2 <- lm.ridge(y ~ X) 

## Compare coefficients 
cbind(fit1$coefficients,c(NA,fit2$coef),coefficients(fit2)) 

[,1] [,2] [,3] 
(Intercept) 0.99276001 NA 0.99276001 
X1 -0.03980772 -0.04282391 -0.03980772 
X2 1.11167179 1.06200476 1.11167179 

-- 

Simon Bonner 
Assistant Professor of Environmetrics/ Director MMASc 
Department of Statistical and Actuarial Sciences/Department of Biology 
University of Western Ontario 

Office: Western Science Centre rm 276 

Email: sbonner6 at uwo.ca | Telephone: 519-661-2111 x88205 | Fax: 519-661-3813 
Twitter: @bonnerstatslab | Website: http://simon.bonners.ca/bonner-lab/wpblog/ 
> -----Original Message----- 
> From: R-devel [mailto:r-devel-bounces at r-project.org] On Behalf Of Nick 
> Brown 
> Sent: May 4, 2017 10:29 AM 
> To: r-devel at r-project.org 
> Subject: [Rd] lm() gives different results to lm.ridge() and SPSS 
> 
> Hallo, 
> 
> I hope I am posting to the right place. I was advised to try this list by
Ben Bolker
> (https://twitter.com/bolkerb/status/859909918446497795). I also posted this
> question to StackOverflow 
> (http://stackoverflow.com/questions/43771269/lm-gives-different-results- 
> from-lm-ridgelambda-0). I am a relative newcomer to R, but I wrote my first
> program in 1975 and have been paid to program in about 15 different 
> languages, so I have some general background knowledge. 
> 
> 
> I have a regression from which I extract the coefficients like this: 
> lm(y ~ x1 * x2, data=ds)$coef 
> That gives: x1=0.40, x2=0.37, x1*x2=0.09 
> 
> 
> 
> When I do the same regression in SPSS, I get: 
> beta(x1)=0.40, beta(x2)=0.37, beta(x1*x2)=0.14. 
> So the main effects are in agreement, but there is quite a difference in
the
> coefficient for the interaction. 
> 
> 
> X1 and X2 are correlated about .75 (yes, yes, I know - this model
wasn't my
> idea, but it got published), so there is quite possibly something going on
with
> collinearity. So I thought I'd try lm.ridge() to see if I can get an
idea of where
> the problems are occurring. 
> 
> 
> The starting point is to run lm.ridge() with lambda=0 (i.e., no ridge
penalty) and
> check we get the same results as with lm(): 
> lm.ridge(y ~ x1 * x2, lambda=0, data=ds)$coef 
> x1=0.40, x2=0.37, x1*x2=0.14 
> So lm.ridge() agrees with SPSS, but not with lm(). (Of course, lambda=0 is
the
> default, so it can be omitted; I can alternate between including or
deleting
> ".ridge" in the function call, and watch the coefficient for the
interaction
> change.) 
> 
> 
> 
> What seems slightly strange to me here is that I assumed that lm.ridge()
just
> piggybacks on lm() anyway, so in the specific case where lambda=0 and there
> is no "ridging" to do, I'd expect exactly the same results. 
> 
> 
> Unfortunately there are 34,000 cases in the dataset, so a
"minimal" reprex will
> not be easy to make, but I can share the data via Dropbox or something if
that
> would help. 
> 
> 
> 
> I appreciate that when there is strong collinearity then all bets are off
in terms
> of what the betas mean, but I would really expect lm() and lm.ridge() to
give
> the same results. (I would be happy to ignore SPSS, but for the moment
it's
> part of the majority!) 
> 
> 
> 
> Thanks for reading, 
> Nick 
> 
> 
> [[alternative HTML version deleted]] 
> 
> ______________________________________________ 
> R-devel at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-devel 

	[[alternative HTML version deleted]]

Possibly Parallel Threads

Search for more seemingly similar threads

R devel - May 2017 - lm() gives different results to lm.ridge() and SPSS

[Rd] lm() gives different results to lm.ridge() and SPSS

[Rd] lm() gives different results to lm.ridge() and SPSS

[Rd] lm() gives different results to lm.ridge() and SPSS

[Rd] lm() gives different results to lm.ridge() and SPSS

[Rd] lm() gives different results to lm.ridge() and SPSS

Possibly Parallel Threads