thr3ads.net - R help - [R] inconsistent lm results with fixed response variable [Jan 2009]

If this information is useful, please help other people find it:
Share via:

tyler

2009-Jan-20 20:21 UTC

[R] inconsistent lm results with fixed response variable

Hi,

I'm analyzing a large number of simulations using lm(), a sample of the
resulting data is pasted below. In some simulations, the response
variable doesn't vary, ie:
> tmp[[2]]$richness [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40

When I analyze this using R version 2.8.0 (2008-10-20) on a linux
cluster, I get an appropriate result:


## begin R ##

summary(lm(richness ~ het, data = tmp[[2]]))

Call:
lm(formula = richness ~ het, data = tmp[[2]])

Residuals:
   Min     1Q Median     3Q    Max
     0      0      0      0      0

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)       40          0     Inf   <2e-16 ***
het                0          0      NA       NA
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 0 on 23 degrees of freedom
Multiple R-squared:      ,      Adjusted R-squared:
F-statistic:       on 1 and 23 DF,  p-value: NA

## end R ##

This is good, as when I extract the Adjusted R-squared and slope I get
NaN and 0, which are easily identified in my aggregate analysis, so I
can deal with them appropriately. 

However, this isn't always the case:

## begin R ##

 tmp[[1]]$richness
 [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
[26] 40 40 40 40 40 40 40 40 40 40 40

 summary(lm(richness ~ het, data = tmp[[1]]))

Call:
lm(formula = richness ~ het, data = tmp[[1]])

Residuals:
       Min         1Q     Median         3Q        Max
-8.265e-14  1.689e-15  2.384e-15  2.946e-15  4.022e-15

Coefficients:
             Estimate Std. Error   t value Pr(>|t|)
(Intercept) 4.000e+01  8.418e-15 4.752e+15   <2e-16 ***
het         1.495e-14  4.723e-14 3.160e-01    0.754
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 1.44e-14 on 34 degrees of freedom
Multiple R-squared: 0.5112,     Adjusted R-squared: 0.4968
F-statistic: 35.56 on 1 and 34 DF,  p-value: 9.609e-07

## end R ##

This is a problem, as when I plot the adj. R sq as part of an aggregate
analysis of a large number of simulations, it appears to be a very
strong regression. I wouldn't have caught this except it was
exceptionally high for the simulation parameters. It also differs by
more than rounding error from the results with R 2.8.1 running on my
laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
Furthermore, on my laptop, none of the analyses produce a NaN adj. R sq,
even for data that do produce that result on the cluster.

Both my laptop and the linux cluster have na.action set to na.omit. Is
there something else I can do to ensure that lm() returns slope == 0
and adj.R.sq == NaN when the response variable is fixed? 

Thanks for any suggestions,

Tyler

Data follows:

`tmp` <-
list(structure(list(richness = c(40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40), range = c(0.655084651733024, 0.579667533660137, 0.433092220907644, 
0.62937198839679, 0.787891987978164, 0.623511540624239, 0.542744487102066, 
0.905937570175433, 0.806802881350753, 0.680413208666325, 0.873426339019084, 
0.699982832956593, 0.697716600618959, 0.952729864926405, 0.782938474636578, 
1.03899695305995, 0.715075858219333, 0.579749205792549, 1.20648999819246, 
0.648677938600964, 0.651883559714785, 0.997318331273967, 0.926368116052012, 
0.91001274146868, 1.20737951037620, 1.12006560586723, 1.09806272133903, 
0.9750792390176, 0.356496202035743, 0.612018080768747, 0.701905693862144, 
0.735857916053381, 0.991787489781244, 1.07247435214078, 0.60061903319766, 
0.699733090379818), het = c(0.154538307084452, 0.143186508136608, 
0.0690948358402777, 0.132337152911839, 0.169037344105692, 0.117783183361602, 
0.117524251767612, 0.221161206774407, 0.204574928003633, 0.170571000779693, 
0.204489357007294, 0.131749663515638, 0.154127894997213, 0.232672587431942, 
0.198610891796736, 0.260497696582693, 0.129028191256682, 0.128717975847452, 
0.254300896783617, 0.113546727236817, 0.142220347446853, 0.24828642688332, 
0.194340945175726, 0.190782985783610, 0.214676796387244, 0.252940213066992, 
0.22362832797347, 0.182423482989676, 0.0602332226418674, 0.145400861749859, 
0.141297315445974, 0.139798699247632, 0.222815139716421, 0.211971297234962, 
0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40)), .Names = c("richness", "range", "het",
"n.rich")),
 structure(list(richness = c(40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624, 0.599708526308711, 
0.714477274087683, 0.892359682406808, 0.868440625159371, 0.753239521511417, 
1.20164969658467, 1.20462111558583, 1.13142122690491, 0.95241921975703, 
1.13214481653550, 0.827528954009827, 1.14827745443481, 0.936048043180592, 
0.874649332193952, 1.38844778296649, 0.985016220913809, 1.18166853164661, 
0.784679773255876, 0.94894149080785, 0.770312904574722, 1.10203660758219, 
1.15624067277321, 0.692776967548628, 0.79343712876973), 
het = c(0.170481207967181, 
0.108265674755723, 0.123316519598517, 0.220631611141464, 0.160460967122565, 
0.145032358811883, 0.293678286125082, 0.284769842125969, 0.258637372765782, 
0.18303781265474, 0.265304220319150, 0.194784967445680, 0.248055723803990, 
0.204658616507612, 0.167203828355069, 0.287030735881294, 0.247639113771915, 
0.269348295820692, 0.111409735752589, 0.209076579513581, 0.176890183224181, 
0.249378876987384, 0.260323833307383, 0.177061093736427, 0.172263958005774
), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names =
c("richness",
"range", "het", "n.rich")))

Rolf Turner

2009-Jan-20 21:01 UTC

head link

[R] inconsistent lm results with fixed response variable

Oh for Pete's sake!

Computers use floating point arithmetic.  Your residual standard  
error in
case 2 (i.e. 1.44e-14) *is* 0, but floating point arithmetic can't  
quite see
that this is so. Put in a check for the RSE being 0, and ``over- 
ride'' the
adjusted R squared to be NA (or NaN, or whatever floats your boat) in  
such instances.
The all.equal() function might be useful to you:

 > x <- 1.44e-14
 > all.equal(x,0)
[1] TRUE

(Caution:  Trap for Young Players:  If x and y are ``really'' different,
then all.equal(x,y) doesn't return FALSE as you might expect, but rather
a description of the difference between x and y --- which may be  
complicated
if x and y are complicated objects.  The function isTRUE() is useful  
here.)

	cheers,

		Rolf Turner


On 21/01/2009, at 9:21 AM, tyler wrote:
> Hi,
>
> I'm analyzing a large number of simulations using lm(), a sample of  
> the
> resulting data is pasted below. In some simulations, the response
> variable doesn't vary, ie:
>
>> tmp[[2]]$richness
>  [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40  
> 40 40 40 40
>
> When I analyze this using R version 2.8.0 (2008-10-20) on a linux
> cluster, I get an appropriate result:
>
>
> ## begin R ##
>
> summary(lm(richness ~ het, data = tmp[[2]]))
>
> Call:
> lm(formula = richness ~ het, data = tmp[[2]])
>
> Residuals:
>    Min     1Q Median     3Q    Max
>      0      0      0      0      0
>
> Coefficients:
>             Estimate Std. Error t value Pr(>|t|)
> (Intercept)       40          0     Inf   <2e-16 ***
> het                0          0      NA       NA
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 0 on 23 degrees of freedom
> Multiple R-squared:      ,      Adjusted R-squared:
> F-statistic:       on 1 and 23 DF,  p-value: NA
>
> ## end R ##
>
> This is good, as when I extract the Adjusted R-squared and slope I get
> NaN and 0, which are easily identified in my aggregate analysis, so I
> can deal with them appropriately.
>
> However, this isn't always the case:
>
> ## begin R ##
>
>  tmp[[1]]$richness
>  [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40  
> 40 40 40 40
> [26] 40 40 40 40 40 40 40 40 40 40 40
>
>  summary(lm(richness ~ het, data = tmp[[1]]))
>
> Call:
> lm(formula = richness ~ het, data = tmp[[1]])
>
> Residuals:
>        Min         1Q     Median         3Q        Max
> -8.265e-14  1.689e-15  2.384e-15  2.946e-15  4.022e-15
>
> Coefficients:
>              Estimate Std. Error   t value Pr(>|t|)
> (Intercept) 4.000e+01  8.418e-15 4.752e+15   <2e-16 ***
> het         1.495e-14  4.723e-14 3.160e-01    0.754
> ---
> Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
>
> Residual standard error: 1.44e-14 on 34 degrees of freedom
> Multiple R-squared: 0.5112,     Adjusted R-squared: 0.4968
> F-statistic: 35.56 on 1 and 34 DF,  p-value: 9.609e-07
>
> ## end R ##
>
> This is a problem, as when I plot the adj. R sq as part of an  
> aggregate
> analysis of a large number of simulations, it appears to be a very
> strong regression. I wouldn't have caught this except it was
> exceptionally high for the simulation parameters. It also differs by
> more than rounding error from the results with R 2.8.1 running on my
> laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968.
> Furthermore, on my laptop, none of the analyses produce a NaN adj.  
> R sq,
> even for data that do produce that result on the cluster.
>
> Both my laptop and the linux cluster have na.action set to na.omit. Is
> there something else I can do to ensure that lm() returns slope == 0
> and adj.R.sq == NaN when the response variable is fixed?
>
> Thanks for any suggestions,
>
> Tyler
>
> Data follows:
>
> `tmp` <-
> list(structure(list(richness = c(40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40), range = c(0.655084651733024, 0.579667533660137,  
> 0.433092220907644,
> 0.62937198839679, 0.787891987978164, 0.623511540624239,  
> 0.542744487102066,
> 0.905937570175433, 0.806802881350753, 0.680413208666325,  
> 0.873426339019084,
> 0.699982832956593, 0.697716600618959, 0.952729864926405,  
> 0.782938474636578,
> 1.03899695305995, 0.715075858219333, 0.579749205792549,  
> 1.20648999819246,
> 0.648677938600964, 0.651883559714785, 0.997318331273967,  
> 0.926368116052012,
> 0.91001274146868, 1.20737951037620, 1.12006560586723,  
> 1.09806272133903,
> 0.9750792390176, 0.356496202035743, 0.612018080768747,  
> 0.701905693862144,
> 0.735857916053381, 0.991787489781244, 1.07247435214078,  
> 0.60061903319766,
> 0.699733090379818), het = c(0.154538307084452, 0.143186508136608,
> 0.0690948358402777, 0.132337152911839, 0.169037344105692,  
> 0.117783183361602,
> 0.117524251767612, 0.221161206774407, 0.204574928003633,  
> 0.170571000779693,
> 0.204489357007294, 0.131749663515638, 0.154127894997213,  
> 0.232672587431942,
> 0.198610891796736, 0.260497696582693, 0.129028191256682,  
> 0.128717975847452,
> 0.254300896783617, 0.113546727236817, 0.142220347446853,  
> 0.24828642688332,
> 0.194340945175726, 0.190782985783610, 0.214676796387244,  
> 0.252940213066992,
> 0.22362832797347, 0.182423482989676, 0.0602332226418674,  
> 0.145400861749859,
> 0.141297315445974, 0.139798699247632, 0.222815139716421,  
> 0.211971297234962,
> 0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40)), .Names = c("richness", "range", "het",
"n.rich")),
>  structure(list(richness = c(40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624,  
> 0.599708526308711,
> 0.714477274087683, 0.892359682406808, 0.868440625159371,  
> 0.753239521511417,
> 1.20164969658467, 1.20462111558583, 1.13142122690491,  
> 0.95241921975703,
> 1.13214481653550, 0.827528954009827, 1.14827745443481,  
> 0.936048043180592,
> 0.874649332193952, 1.38844778296649, 0.985016220913809,  
> 1.18166853164661,
> 0.784679773255876, 0.94894149080785, 0.770312904574722,  
> 1.10203660758219,
> 1.15624067277321, 0.692776967548628, 0.79343712876973),
> het = c(0.170481207967181,
> 0.108265674755723, 0.123316519598517, 0.220631611141464,  
> 0.160460967122565,
> 0.145032358811883, 0.293678286125082, 0.284769842125969,  
> 0.258637372765782,
> 0.18303781265474, 0.265304220319150, 0.194784967445680,  
> 0.248055723803990,
> 0.204658616507612, 0.167203828355069, 0.287030735881294,  
> 0.247639113771915,
> 0.269348295820692, 0.111409735752589, 0.209076579513581,  
> 0.176890183224181,
> 0.249378876987384, 0.260323833307383, 0.177061093736427,  
> 0.172263958005774
> ), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names = c 
> ("richness",
> "range", "het", "n.rich")))
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting- 
> guide.html
> and provide commented, minimal, self-contained, reproducible code.
######################################################################
Attention: 
This e-mail message is privileged and confidential. If you are not the 
intended recipient please delete the message and notify the sender. 
Any views or opinions presented are solely those of the author.

This e-mail has been scanned and cleared by MailMarshal 
www.marshalsoftware.com
######################################################################

Reasonably Related Threads

Search for more apparently analagous threads

R help - Jan 2009 - inconsistent lm results with fixed response variable

[R] inconsistent lm results with fixed response variable

[R] inconsistent lm results with fixed response variable

Reasonably Related Threads