Jan-Henrik Pötter
2010-Jan-08 16:50 UTC
[R] how to get perfect fit of lm if response is constant
Hello. Consider the response-variable of data.frame df is constant, so analytically perfect fit of a linear model is expected. Fitting a regression line using lm result in residuals, slope and std.errors not exactly zero, which is acceptable in some way, but errorneous. But if you use summary.lm it shows inacceptable error propagation in the calculation of the t value and the corresponding p-value for the slope, as well R-Square – just consider the adj R-Square of 0.6788! This result is independent of which mode used for the input vectors. Is there any way to get the perfect fitted regression curve using lm and prevent this error propagation? I consider rounding all values of the lm-object afterwards to somewhat precision as a bad idea. Unfortunately there is no option in lm for calculation precision.> df<-data.frame(x=1:10,y=1)> myl<-lm(y~x,data=df)> mylCall: lm(formula = y ~ x, data = df) Coefficients: (Intercept) x 1.000e+00 9.463e-18> summary(myl)Call: lm(formula = y ~ x, data = df) Residuals: Min 1Q Median 3Q Max -1.136e-16 -1.341e-17 7.886e-18 2.918e-17 5.047e-17 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.000e+00 3.390e-17 2.950e+16 <2e-16 *** x 9.463e-18 5.463e-18 1.732e+00 0.122 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 4.962e-17 on 8 degrees of freedom Multiple R-squared: 0.7145, Adjusted R-squared: 0.6788 F-statistic: 20.02 on 1 and 8 DF, p-value: 0.002071 [[alternative HTML version deleted]]
Peter Ehlers
2010-Jan-08 18:43 UTC
[R] how to get perfect fit of lm if response is constant
You need to review the assumptions of linear models: y is assumed to be the realization of a random variable, not a constant (or, more precisely: there are assumed to be deviations that are N(0, sigma^2). If you 'know' that y is a constant, then you have two options: 1. don't do the regression because it makes no sense; 2. if you want to test lm()'s handling of the data: fm <- lm(y ~ x, data = df, offset = rep(1, nrow(df))) (or use: offset = y) -Peter Ehlers Jan-Henrik P?tter wrote:> Hello. > > Consider the response-variable of data.frame df is constant, so analytically > perfect fit of a linear model is expected. Fitting a regression line using > lm result in residuals, slope and std.errors not exactly zero, which is > acceptable in some way, but errorneous. But if you use summary.lm it shows > inacceptable error propagation in the calculation of the t value and the > corresponding p-value for the slope, as well R-Square ? just consider the > adj R-Square of 0.6788! This result is independent of which mode used for > the input vectors. Is there any way to get the perfect fitted regression > curve using lm and prevent this error propagation? I consider rounding all > values of the lm-object afterwards to somewhat precision as a bad idea. > Unfortunately there is no option in lm for calculation precision. > > > >> df<-data.frame(x=1:10,y=1) > >> myl<-lm(y~x,data=df) > > > >> myl > > > > Call: > > lm(formula = y ~ x, data = df) > > > > Coefficients: > > (Intercept) x > > 1.000e+00 9.463e-18 > > > >> summary(myl) > > > > Call: > > lm(formula = y ~ x, data = df) > > > > Residuals: > > Min 1Q Median 3Q Max > > -1.136e-16 -1.341e-17 7.886e-18 2.918e-17 5.047e-17 > > > > Coefficients: > > Estimate Std. Error t value Pr(>|t|) > > (Intercept) 1.000e+00 3.390e-17 2.950e+16 <2e-16 *** > > x 9.463e-18 5.463e-18 1.732e+00 0.122 > > --- > > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > > > Residual standard error: 4.962e-17 on 8 degrees of freedom > > Multiple R-squared: 0.7145, Adjusted R-squared: 0.6788 > > F-statistic: 20.02 on 1 and 8 DF, p-value: 0.002071 > > > > > [[alternative HTML version deleted]] > > > > ------------------------------------------------------------------------ > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Ehlers University of Calgary 403.202.3921
Just to clarify this point: I don't think the problem is that y is "perfectly fittable", but that it is constant. Since the variance of a constant is zero, there is no variance to explain. -Ista On Fri, Jan 8, 2010 at 2:32 PM, Jan-Henrik P?tter <henrik.poetter at gmx.de> wrote:> Thanks for the answer. > The situation is that I don't know anything of y a priori. Of course I then would not do a regression on constant y's, but isn't it a problem of stability of the algorithm, if I get an adj RSquare of 0.6788 for > a least square fit on this type of data? I think lm should give me a correct result even in case of y is perfectly fittable, because the situation is that I never know if my data could become so. If I have to offset y in this case, then my question becomes how noisy do my y's have to be, so that I can rely on the lm result, if I specify the formula y~x without offset. What if my y's become nearly linear (or nearly perfect fittable with another linear model). I think my question now becomes 'how to rely on lm's result if the formula is specified the way y~x without offset? or 'How do I prevent my result to become numerically incorrect if I may get nearly perfect fittable y's'. > > Greetings > > Henrik > > > -----Urspr?ngliche Nachricht----- > Von: Peter Ehlers [mailto:ehlers at ucalgary.ca] > Gesendet: Freitag, 8. Januar 2010 19:44 > An: Jan-Henrik P?tter > Cc: r-help at r-project.org > Betreff: Re: [R] how to get perfect fit of lm if response is constant > > You need to review the assumptions of linear models: > y is assumed to be the realization of a random variable, > not a constant (or, more precisely: there are assumed to > be deviations that are N(0, sigma^2). > > If you 'know' that y is a constant, then you have > two options: > > 1. don't do the regression because it makes no sense; > 2. if you want to test lm()'s handling of the data: > > fm <- lm(y ~ x, data = df, offset = rep(1, nrow(df))) > > (or use: offset = y) > > ?-Peter Ehlers > > Jan-Henrik P?tter wrote: >> Hello. >> >> Consider the response-variable of data.frame df is constant, so analytically >> perfect fit of a linear model is expected. Fitting a regression line using >> lm result in residuals, slope and std.errors not exactly zero, which is >> acceptable in some way, but errorneous. But if you use summary.lm it shows >> inacceptable error propagation in the calculation of the t value and the >> corresponding p-value for the slope, as well R-Square ? just consider the >> adj R-Square of 0.6788! This result is independent of which mode used for >> the input vectors. Is there any way to get the perfect fitted regression >> curve using lm and prevent this error propagation? I consider rounding all >> values of the lm-object afterwards to somewhat precision as a bad idea. >> Unfortunately there is no option in lm for calculation precision. >> >> >> >>> df<-data.frame(x=1:10,y=1) >> >>> myl<-lm(y~x,data=df) >> >> >> >>> myl >> >> >> >> Call: >> >> lm(formula = y ~ x, data = df) >> >> >> >> Coefficients: >> >> (Intercept) ? ? ? ? ? ?x >> >> ? 1.000e+00 ? ?9.463e-18 >> >> >> >>> summary(myl) >> >> >> >> Call: >> >> lm(formula = y ~ x, data = df) >> >> >> >> Residuals: >> >> ? ? ? ?Min ? ? ? ? 1Q ? ? Median ? ? ? ? 3Q ? ? ? ?Max >> >> -1.136e-16 -1.341e-17 ?7.886e-18 ?2.918e-17 ?5.047e-17 >> >> >> >> Coefficients: >> >> ? ? ? ? ? ? ?Estimate Std. Error ? t value Pr(>|t|) >> >> (Intercept) 1.000e+00 ?3.390e-17 2.950e+16 ? <2e-16 *** >> >> x ? ? ? ? ? 9.463e-18 ?5.463e-18 1.732e+00 ? ?0.122 >> >> --- >> >> Signif. codes: ?0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >> >> >> >> Residual standard error: 4.962e-17 on 8 degrees of freedom >> >> Multiple R-squared: 0.7145, ? ? Adjusted R-squared: 0.6788 >> >> F-statistic: 20.02 on 1 and 8 DF, ?p-value: 0.002071 >> >> >> >> >> ? ? ? [[alternative HTML version deleted]] >> >> >> >> ------------------------------------------------------------------------ >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Ehlers > University of Calgary > 403.202.3921 > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org