StatWM
2010-Jul-20 09:41 UTC
[R] Correct statistical inference for linear regression models without intercept in R
Dear R community, is there a way to get correct t- and p-values and R squared for linear regression models specified without an intercept? example model: summary(lm(y ~ 0 + x)) This gives too low p-values and too high R squared. Is there a way to correct it? Or should I specify with intercept to get the correct values? Thank you in advance! Wojtek Musial -- View this message in context: http://r.789695.n4.nabble.com/Correct-statistical-inference-for-linear-regression-models-without-intercept-in-R-tp2295193p2295193.html Sent from the R help mailing list archive at Nabble.com.
Arun.stat
2010-Jul-20 10:06 UTC
[R] Correct statistical inference for linear regression models without intercept in R
What x and y represent? Are they non-stationary, trending? then you would get very high R2 (~97-99%) and very low p-value. Perhaps you land on the world of spurious regression. In this case forcing intercept to zero would not help you. Work with differenced series instead raw data. Thanks and regards, -- View this message in context: http://r.789695.n4.nabble.com/Correct-statistical-inference-for-linear-regression-models-without-intercept-in-R-tp2295193p2295230.html Sent from the R help mailing list archive at Nabble.com.
Dennis Murphy
2010-Jul-20 10:33 UTC
[R] Correct statistical inference for linear regression models without intercept in R
Hi: On Tue, Jul 20, 2010 at 2:41 AM, StatWM <wmusial@gmx.de> wrote:> > Dear R community, > > is there a way to get correct t- and p-values and R squared for linear > regression models specified without an intercept? > > example model: > summary(lm(y ~ 0 + x)) > > This gives too low p-values and too high R squared. Is there a way to > correct it? Or should I specify with intercept to get the correct values? >How do you know that the p-value is too low and R^2 is too high? Too low or too high compared to what? You've constrained the intercept of the model to pass through zero, which affects several features of a simple linear regression model. For example, sum the residuals from your no-intercept model - I'll bet they don't add to zero. Do you think that might affect a few things? Here's an example: # Generate some data; notice that the true y-intercept is 2 and the true slope is 2 dd <- data.frame(x = 1:10, y = 2 + 2 * 1:10 + rnorm(10)) plot(y ~ x, data = dd, xlim = c(0, 10), ylim = c(0, 25)) m1 <- lm(y ~ x, data = dd) abline(coef(m1)) m2 <- lm(y ~ x + 0, data = dd) abline(c(0, coef(m2)), lty = 'dotted') # As you noted, the no-intercept model has a higher R^2, # even though the 'usual' simple linear regression (SLR) # model provided a better visual fit. Why? summary(m1)$r.squared [1] 0.982328 summary(m2)$r.squared [1] 0.9946863 # The p-value for the F-test on the slope is higher in the # no-intercept model is lower than in the SLR model. Why? anova(m1) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x 1 385.22 385.22 444.69 2.686e-08 *** Residuals 8 6.93 0.87 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 anova(m2) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) x 1 2164.07 2164.07 1684.7 1.507e-11 *** Residuals 9 11.56 1.28 Look at the differences in sums of squares between the two models, both in terms of model SS and error SS. What is responsible for those differences? Once you understand that, it becomes clear why the apparent anomalies in R^2 and in the F-test occur by applying the definitions. Also try sum(m1$resid) sum(m2$resid) Why is there a difference? Why dies m2$resid not have to sum to zero? (Hint: The output in each case is correct, so it's not an R problem. You need to derive the differences among the various quantities in regression modeling between the intercept and no-intercept models to understand the paradox.) HTH, Dennis Thank you in advance!> > Wojtek Musial > -- > View this message in context: > http://r.789695.n4.nabble.com/Correct-statistical-inference-for-linear-regression-models-without-intercept-in-R-tp2295193p2295193.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
peter dalgaard
2010-Jul-20 10:35 UTC
[R] Correct statistical inference for linear regression models without intercept in R
On Jul 20, 2010, at 11:41 AM, StatWM wrote:> > Dear R community, > > is there a way to get correct t- and p-values and R squared for linear > regression models specified without an intercept? > > example model: > summary(lm(y ~ 0 + x)) > > This gives too low p-values and too high R squared. Is there a way to > correct it? Or should I specify with intercept to get the correct values?They are already correct. If you want incorrect ones, please specify their definition...> Thank you in advance! > > Wojtek Musial > -- > View this message in context: http://r.789695.n4.nabble.com/Correct-statistical-inference-for-linear-regression-models-without-intercept-in-R-tp2295193p2295193.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
StatWM
2010-Jul-20 17:47 UTC
[R] Correct statistical inference for linear regression models without intercept in R
Thank you very much for your effort! But is there a measure, which can compare the goodness of fit of regression models with and without the intercept? Can I only compare them in terms of sum of squares residual? -- View this message in context: http://r.789695.n4.nabble.com/Correct-statistical-inference-for-linear-regression-models-without-intercept-in-R-tp2295193p2295960.html Sent from the R help mailing list archive at Nabble.com.