Hi, I'm analyzing a large number of simulations using lm(), a sample of the resulting data is pasted below. In some simulations, the response variable doesn't vary, ie:> tmp[[2]]$richness[1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 When I analyze this using R version 2.8.0 (2008-10-20) on a linux cluster, I get an appropriate result: ## begin R ## summary(lm(richness ~ het, data = tmp[[2]])) Call: lm(formula = richness ~ het, data = tmp[[2]]) Residuals: Min 1Q Median 3Q Max 0 0 0 0 0 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 40 0 Inf <2e-16 *** het 0 0 NA NA --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 0 on 23 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 23 DF, p-value: NA ## end R ## This is good, as when I extract the Adjusted R-squared and slope I get NaN and 0, which are easily identified in my aggregate analysis, so I can deal with them appropriately. However, this isn't always the case: ## begin R ## tmp[[1]]$richness [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 [26] 40 40 40 40 40 40 40 40 40 40 40 summary(lm(richness ~ het, data = tmp[[1]])) Call: lm(formula = richness ~ het, data = tmp[[1]]) Residuals: Min 1Q Median 3Q Max -8.265e-14 1.689e-15 2.384e-15 2.946e-15 4.022e-15 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 4.000e+01 8.418e-15 4.752e+15 <2e-16 *** het 1.495e-14 4.723e-14 3.160e-01 0.754 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 1.44e-14 on 34 degrees of freedom Multiple R-squared: 0.5112, Adjusted R-squared: 0.4968 F-statistic: 35.56 on 1 and 34 DF, p-value: 9.609e-07 ## end R ## This is a problem, as when I plot the adj. R sq as part of an aggregate analysis of a large number of simulations, it appears to be a very strong regression. I wouldn't have caught this except it was exceptionally high for the simulation parameters. It also differs by more than rounding error from the results with R 2.8.1 running on my laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968. Furthermore, on my laptop, none of the analyses produce a NaN adj. R sq, even for data that do produce that result on the cluster. Both my laptop and the linux cluster have na.action set to na.omit. Is there something else I can do to ensure that lm() returns slope == 0 and adj.R.sq == NaN when the response variable is fixed? Thanks for any suggestions, Tyler Data follows: `tmp` <- list(structure(list(richness = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40), range = c(0.655084651733024, 0.579667533660137, 0.433092220907644, 0.62937198839679, 0.787891987978164, 0.623511540624239, 0.542744487102066, 0.905937570175433, 0.806802881350753, 0.680413208666325, 0.873426339019084, 0.699982832956593, 0.697716600618959, 0.952729864926405, 0.782938474636578, 1.03899695305995, 0.715075858219333, 0.579749205792549, 1.20648999819246, 0.648677938600964, 0.651883559714785, 0.997318331273967, 0.926368116052012, 0.91001274146868, 1.20737951037620, 1.12006560586723, 1.09806272133903, 0.9750792390176, 0.356496202035743, 0.612018080768747, 0.701905693862144, 0.735857916053381, 0.991787489781244, 1.07247435214078, 0.60061903319766, 0.699733090379818), het = c(0.154538307084452, 0.143186508136608, 0.0690948358402777, 0.132337152911839, 0.169037344105692, 0.117783183361602, 0.117524251767612, 0.221161206774407, 0.204574928003633, 0.170571000779693, 0.204489357007294, 0.131749663515638, 0.154127894997213, 0.232672587431942, 0.198610891796736, 0.260497696582693, 0.129028191256682, 0.128717975847452, 0.254300896783617, 0.113546727236817, 0.142220347446853, 0.24828642688332, 0.194340945175726, 0.190782985783610, 0.214676796387244, 0.252940213066992, 0.22362832797347, 0.182423482989676, 0.0602332226418674, 0.145400861749859, 0.141297315445974, 0.139798699247632, 0.222815139716421, 0.211971297234962, 0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names = c("richness", "range", "het", "n.rich")), structure(list(richness = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624, 0.599708526308711, 0.714477274087683, 0.892359682406808, 0.868440625159371, 0.753239521511417, 1.20164969658467, 1.20462111558583, 1.13142122690491, 0.95241921975703, 1.13214481653550, 0.827528954009827, 1.14827745443481, 0.936048043180592, 0.874649332193952, 1.38844778296649, 0.985016220913809, 1.18166853164661, 0.784679773255876, 0.94894149080785, 0.770312904574722, 1.10203660758219, 1.15624067277321, 0.692776967548628, 0.79343712876973), het = c(0.170481207967181, 0.108265674755723, 0.123316519598517, 0.220631611141464, 0.160460967122565, 0.145032358811883, 0.293678286125082, 0.284769842125969, 0.258637372765782, 0.18303781265474, 0.265304220319150, 0.194784967445680, 0.248055723803990, 0.204658616507612, 0.167203828355069, 0.287030735881294, 0.247639113771915, 0.269348295820692, 0.111409735752589, 0.209076579513581, 0.176890183224181, 0.249378876987384, 0.260323833307383, 0.177061093736427, 0.172263958005774 ), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names = c("richness", "range", "het", "n.rich")))
Rolf Turner
2009-Jan-20 21:01 UTC
[R] inconsistent lm results with fixed response variable
Oh for Pete's sake! Computers use floating point arithmetic. Your residual standard error in case 2 (i.e. 1.44e-14) *is* 0, but floating point arithmetic can't quite see that this is so. Put in a check for the RSE being 0, and ``over- ride'' the adjusted R squared to be NA (or NaN, or whatever floats your boat) in such instances. The all.equal() function might be useful to you: > x <- 1.44e-14 > all.equal(x,0) [1] TRUE (Caution: Trap for Young Players: If x and y are ``really'' different, then all.equal(x,y) doesn't return FALSE as you might expect, but rather a description of the difference between x and y --- which may be complicated if x and y are complicated objects. The function isTRUE() is useful here.) cheers, Rolf Turner On 21/01/2009, at 9:21 AM, tyler wrote:> Hi, > > I'm analyzing a large number of simulations using lm(), a sample of > the > resulting data is pasted below. In some simulations, the response > variable doesn't vary, ie: > >> tmp[[2]]$richness > [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 > 40 40 40 40 > > When I analyze this using R version 2.8.0 (2008-10-20) on a linux > cluster, I get an appropriate result: > > > ## begin R ## > > summary(lm(richness ~ het, data = tmp[[2]])) > > Call: > lm(formula = richness ~ het, data = tmp[[2]]) > > Residuals: > Min 1Q Median 3Q Max > 0 0 0 0 0 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 40 0 Inf <2e-16 *** > het 0 0 NA NA > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 0 on 23 degrees of freedom > Multiple R-squared: , Adjusted R-squared: > F-statistic: on 1 and 23 DF, p-value: NA > > ## end R ## > > This is good, as when I extract the Adjusted R-squared and slope I get > NaN and 0, which are easily identified in my aggregate analysis, so I > can deal with them appropriately. > > However, this isn't always the case: > > ## begin R ## > > tmp[[1]]$richness > [1] 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 > 40 40 40 40 > [26] 40 40 40 40 40 40 40 40 40 40 40 > > summary(lm(richness ~ het, data = tmp[[1]])) > > Call: > lm(formula = richness ~ het, data = tmp[[1]]) > > Residuals: > Min 1Q Median 3Q Max > -8.265e-14 1.689e-15 2.384e-15 2.946e-15 4.022e-15 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 4.000e+01 8.418e-15 4.752e+15 <2e-16 *** > het 1.495e-14 4.723e-14 3.160e-01 0.754 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 1.44e-14 on 34 degrees of freedom > Multiple R-squared: 0.5112, Adjusted R-squared: 0.4968 > F-statistic: 35.56 on 1 and 34 DF, p-value: 9.609e-07 > > ## end R ## > > This is a problem, as when I plot the adj. R sq as part of an > aggregate > analysis of a large number of simulations, it appears to be a very > strong regression. I wouldn't have caught this except it was > exceptionally high for the simulation parameters. It also differs by > more than rounding error from the results with R 2.8.1 running on my > laptop (Debian GNU/Linux), i.e., adj. R sq 0.5042 vs 0.4968. > Furthermore, on my laptop, none of the analyses produce a NaN adj. > R sq, > even for data that do produce that result on the cluster. > > Both my laptop and the linux cluster have na.action set to na.omit. Is > there something else I can do to ensure that lm() returns slope == 0 > and adj.R.sq == NaN when the response variable is fixed? > > Thanks for any suggestions, > > Tyler > > Data follows: > > `tmp` <- > list(structure(list(richness = c(40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40), range = c(0.655084651733024, 0.579667533660137, > 0.433092220907644, > 0.62937198839679, 0.787891987978164, 0.623511540624239, > 0.542744487102066, > 0.905937570175433, 0.806802881350753, 0.680413208666325, > 0.873426339019084, > 0.699982832956593, 0.697716600618959, 0.952729864926405, > 0.782938474636578, > 1.03899695305995, 0.715075858219333, 0.579749205792549, > 1.20648999819246, > 0.648677938600964, 0.651883559714785, 0.997318331273967, > 0.926368116052012, > 0.91001274146868, 1.20737951037620, 1.12006560586723, > 1.09806272133903, > 0.9750792390176, 0.356496202035743, 0.612018080768747, > 0.701905693862144, > 0.735857916053381, 0.991787489781244, 1.07247435214078, > 0.60061903319766, > 0.699733090379818), het = c(0.154538307084452, 0.143186508136608, > 0.0690948358402777, 0.132337152911839, 0.169037344105692, > 0.117783183361602, > 0.117524251767612, 0.221161206774407, 0.204574928003633, > 0.170571000779693, > 0.204489357007294, 0.131749663515638, 0.154127894997213, > 0.232672587431942, > 0.198610891796736, 0.260497696582693, 0.129028191256682, > 0.128717975847452, > 0.254300896783617, 0.113546727236817, 0.142220347446853, > 0.24828642688332, > 0.194340945175726, 0.190782985783610, 0.214676796387244, > 0.252940213066992, > 0.22362832797347, 0.182423482989676, 0.0602332226418674, > 0.145400861749859, > 0.141297315445974, 0.139798699247632, 0.222815139716421, > 0.211971297234962, > 0.120813579628747, 0.150590744533818), n.rich = c(40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40)), .Names = c("richness", "range", "het", "n.rich")), > structure(list(richness = c(40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40), range = c(0.753203162648624, > 0.599708526308711, > 0.714477274087683, 0.892359682406808, 0.868440625159371, > 0.753239521511417, > 1.20164969658467, 1.20462111558583, 1.13142122690491, > 0.95241921975703, > 1.13214481653550, 0.827528954009827, 1.14827745443481, > 0.936048043180592, > 0.874649332193952, 1.38844778296649, 0.985016220913809, > 1.18166853164661, > 0.784679773255876, 0.94894149080785, 0.770312904574722, > 1.10203660758219, > 1.15624067277321, 0.692776967548628, 0.79343712876973), > het = c(0.170481207967181, > 0.108265674755723, 0.123316519598517, 0.220631611141464, > 0.160460967122565, > 0.145032358811883, 0.293678286125082, 0.284769842125969, > 0.258637372765782, > 0.18303781265474, 0.265304220319150, 0.194784967445680, > 0.248055723803990, > 0.204658616507612, 0.167203828355069, 0.287030735881294, > 0.247639113771915, > 0.269348295820692, 0.111409735752589, 0.209076579513581, > 0.176890183224181, > 0.249378876987384, 0.260323833307383, 0.177061093736427, > 0.172263958005774 > ), n.rich = c(40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, > 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40)), .Names = c > ("richness", > "range", "het", "n.rich"))) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.###################################################################### Attention: This e-mail message is privileged and confidential. If you are not the intended recipient please delete the message and notify the sender. Any views or opinions presented are solely those of the author. This e-mail has been scanned and cleared by MailMarshal www.marshalsoftware.com ######################################################################
Reasonably Related Threads
- how to plot y-axis on the right of x-axis
- Nested ANOVA with covariate using Type III sums of squares
- Quasi-poisson glm and calculating a qAIC and qAICc...trying to modilfy Bolker et al. 2009 function to work for a glm model
- gamm: problems with corCAR1()
- create loops in the explanatory variables using lm