thr3ads.net - R help - [R] Surprising results from summary(lm()) on data with NO variation [Dec 2002]

If this information is useful, please help other people find it:
Share via:

Warnes, Gregory R

2002-Dec-13 21:34 UTC

[R] Surprising results from summary(lm()) on data with NO variation

I have some data (from Affymetrix experiments) where I fit an aov() model to
a large number of outcome variables.  A reasonable fraction of the outcome
variables have 0 variability because values below a cutoff have been
replaced with the cutoff value (often 20) .  

In this case, the overall p-value from summary(lm(..))  is misleadingly
small:

For example (equivalent results in R 1.6.1 on both Solaris and for Windows
2000):


	> x <- rep(20, 4+5+5)  # no variation
	> Treatment <- factor(rep(
c("Control","Low","High"), c(4,5,5) ))  #
3 treatment levels
	> summary( lm( x ~ Treatment)  )
	
	Call:
	lm(formula = x ~ Treatment)
	
	Residuals:
	       Min         1Q     Median         3Q        Max 
	-1.994e-14  7.889e-31  7.889e-31  7.889e-31  6.647e-15 
	
	Coefficients:
	               Estimate Std. Error   t value Pr(>|t|)    
	(Intercept)   2.000e+01  3.471e-15 5.762e+15   <2e-16 ***
	TreatmentHigh 6.647e-15  4.657e-15 1.427e+00    0.181    
	TreatmentLow  6.647e-15  4.657e-15 1.427e+00    0.181    
	---
	Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
	
	Residual standard error: 6.942e-15 on 11 degrees of freedom
	Multiple R-Squared: 0.5333,     Adjusted R-squared: 0.4485 
	F-statistic: 6.286 on 2 and 11 DF,  p-value: 0.01512
                                          ^^^^^^^^^^^^^^^^

Note F statistic and the overall p-value.     

For some values (eg 0 and 7),  fitting this model will generate NA's for the
estimates.  For most values, we get similar results.  I presume that this is
due to numerical problems in the fitting algorithm.  Still the result is
quite surprising.

It appears that summary.aov doesn't have the same problem:

	> x <- rep(20, 4+5+5)  # no variation
	> Treatment <- factor(rep(
c("Control","Low","High"), c(4,5,5) ))  #
3 treatment levels
	> summary( aov( x ~ Treatment)  )
	            Df     Sum Sq    Mean Sq F value Pr(>F)
	Treatment    2 1.2622e-28 6.3110e-29  1.3095 0.3089
	Residuals   11 5.3011e-28 4.8190e-29               

Perhaps the mechanism for computing the F statistic that summary.aov uses is
less susceptible to the 'no variability' problem than the computation in
summary.lm.

I'm posting this here  to raise a warning flag.  For my particular
application, I'll just check for the no variability case and handle it
separately.  I'll leave it to the R core folks to decide if this is a big
enough problem to do something about in the code.

-Greg


LEGAL NOTICE\ Unless expressly stated otherwise, this message is ... [[dropped]]

ripley@stats.ox.ac.uk

2002-Dec-13 21:50 UTC

head link

[R] Surprising results from summary(lm()) on data with NO variation

It's just rounding errors.  We can't tell if you are genuinely
interested
in studying rounding error or not.  In other words, whether something is
really constant or has a very small range of variation is not something
you can get from the numbers alone.

lda tries hard to warn here, but doing that well is expensive.

On Fri, 13 Dec 2002, Warnes, Gregory R wrote:
>
> I have some data (from Affymetrix experiments) where I fit an aov() model
to
> a large number of outcome variables.  A reasonable fraction of the outcome
> variables have 0 variability because values below a cutoff have been
> replaced with the cutoff value (often 20) .
>
> In this case, the overall p-value from summary(lm(..))  is misleadingly
> small:
>
> For example (equivalent results in R 1.6.1 on both Solaris and for Windows
> 2000):
>
>
> 	> x <- rep(20, 4+5+5)  # no variation
> 	> Treatment <- factor(rep(
c("Control","Low","High"), c(4,5,5) ))  #
> 3 treatment levels
> 	> summary( lm( x ~ Treatment)  )
>
> 	Call:
> 	lm(formula = x ~ Treatment)
>
> 	Residuals:
> 	       Min         1Q     Median         3Q        Max
> 	-1.994e-14  7.889e-31  7.889e-31  7.889e-31  6.647e-15
>
> 	Coefficients:
> 	               Estimate Std. Error   t value Pr(>|t|)
> 	(Intercept)   2.000e+01  3.471e-15 5.762e+15   <2e-16 ***
> 	TreatmentHigh 6.647e-15  4.657e-15 1.427e+00    0.181
> 	TreatmentLow  6.647e-15  4.657e-15 1.427e+00    0.181
> 	---
> 	Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 `
' 1
>
> 	Residual standard error: 6.942e-15 on 11 degrees of freedom
> 	Multiple R-Squared: 0.5333,     Adjusted R-squared: 0.4485
> 	F-statistic: 6.286 on 2 and 11 DF,  p-value: 0.01512
>                                           ^^^^^^^^^^^^^^^^
>
> Note F statistic and the overall p-value.
>
> For some values (eg 0 and 7),  fitting this model will generate NA's
for the
> estimates.  For most values, we get similar results.  I presume that this
is
> due to numerical problems in the fitting algorithm.  Still the result is
> quite surprising.
>
> It appears that summary.aov doesn't have the same problem:
>
> 	> x <- rep(20, 4+5+5)  # no variation
> 	> Treatment <- factor(rep(
c("Control","Low","High"), c(4,5,5) ))  #
> 3 treatment levels
> 	> summary( aov( x ~ Treatment)  )
> 	            Df     Sum Sq    Mean Sq F value Pr(>F)
> 	Treatment    2 1.2622e-28 6.3110e-29  1.3095 0.3089
> 	Residuals   11 5.3011e-28 4.8190e-29
>
> Perhaps the mechanism for computing the F statistic that summary.aov uses
is
> less susceptible to the 'no variability' problem than the
computation in
> summary.lm.
>
> I'm posting this here  to raise a warning flag.  For my particular
> application, I'll just check for the no variability case and handle it
> separately.  I'll leave it to the R core folks to decide if this is a
big
> enough problem to do something about in the code.
>
> -Greg
>
>
> LEGAL NOTICE\ Unless expressly stated otherwise, this message is ...
[[dropped]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> http://www.stat.math.ethz.ch/mailman/listinfo/r-help
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Maybe Matching Threads

Search for more possibly parallel threads

R help - Dec 2002 - Surprising results from summary(lm()) on data with NO variation

[R] Surprising results from summary(lm()) on data with NO variation

[R] Surprising results from summary(lm()) on data with NO variation

Maybe Matching Threads