Dear R People: Here is a small data frame and two particular formulas:> test.dfy x 1 -0.9261650 1 2 1.5702700 2 3 0.1673920 3 4 0.7893085 4 5 0.3576875 5 6 -1.4620915 6 7 -0.5506215 7 8 -0.3480292 8 9 -1.2344036 9 10 0.8502660 10> summary(lm(exp(y)~x))Call: lm(formula = exp(y) ~ x) Residuals: Min 1Q Median 3Q Max -1.6360 -0.6435 -0.4722 0.4215 2.9127 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.1689 0.9782 2.217 0.0574 . x -0.1368 0.1577 -0.868 0.4108 --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 1.432 on 8 degrees of freedom Multiple R-squared: 0.08604, Adjusted R-squared: -0.0282 F-statistic: 0.7532 on 1 and 8 DF, p-value: 0.4108> summary(lm(I(y^2)~x))Call: lm(formula = I(y^2) ~ x) Residuals: Min 1Q Median 3Q Max -0.9584 -0.6387 -0.2651 0.5754 1.4412 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.10084 0.62428 1.763 0.116 x -0.03813 0.10061 -0.379 0.715 Residual standard error: 0.9138 on 8 degrees of freedom Multiple R-squared: 0.01764, Adjusted R-squared: -0.1052 F-statistic: 0.1436 on 1 and 8 DF, p-value: 0.7146>These both work just fine. My question is: when do you know to use I() and just the function of the variable, please? thanks in advance, Sincerely, Erin PS Happy St Pat's Day! -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com
On 17-Mar-09 23:04:25, Erin Hodgess wrote:> Dear R People: > Here is a small data frame and two particular formulas: >> test.df > y x > 1 -0.9261650 1 > 2 1.5702700 2 > 3 0.1673920 3 > 4 0.7893085 4 > 5 0.3576875 5 > 6 -1.4620915 6 > 7 -0.5506215 7 > 8 -0.3480292 8 > 9 -1.2344036 9 > 10 0.8502660 10 >> summary(lm(exp(y)~x)) > > Call: > lm(formula = exp(y) ~ x) > > Residuals: > Min 1Q Median 3Q Max > -1.6360 -0.6435 -0.4722 0.4215 2.9127 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.1689 0.9782 2.217 0.0574 . > x -0.1368 0.1577 -0.868 0.4108 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 1.432 on 8 degrees of freedom > Multiple R-squared: 0.08604, Adjusted R-squared: -0.0282 > F-statistic: 0.7532 on 1 and 8 DF, p-value: 0.4108 > >> summary(lm(I(y^2)~x)) > > Call: > lm(formula = I(y^2) ~ x) > > Residuals: > Min 1Q Median 3Q Max > -0.9584 -0.6387 -0.2651 0.5754 1.4412 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.10084 0.62428 1.763 0.116 > x -0.03813 0.10061 -0.379 0.715 > > Residual standard error: 0.9138 on 8 degrees of freedom > Multiple R-squared: 0.01764, Adjusted R-squared: -0.1052 > F-statistic: 0.1436 on 1 and 8 DF, p-value: 0.7146 > >> > > These both work just fine. > > My question is: when do you know to use I() and just the function of > the variable, please? > > thanks in advance, > Erin > PS Happy St Pat's Day!In the case of your formula you will find it works just as well without I(): summary(lm(y^2 ~ x)) Call: lm(formula = y^2 ~ x) Residuals: Min 1Q Median 3Q Max -0.9584 -0.6387 -0.2651 0.5754 1.4412 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.10084 0.62428 1.763 0.116 x -0.03813 0.10061 -0.379 0.715 The point of I() is that it forces numerical evaluation in an expression which could be interpreted as a symbolic model formula. Thus if X1 and X2 were numeric, and you want to regress Y on the numerical values of X1*X2, then you should use I(X1*X2), since in Y ~ X1*X2 this would be interpreted as (essentially) fitting both linear terms and their interaction (equivalent to product here), namely corresponding to Y = a + b1*X1 + b2*X2 + b12*X1*X2 In order to force the fitted equation to be Y = a + b*X1*X2 you would use Y ~ I(X1*X2). This issue does not arise when a product is on the left-hand side of the model formula, so you could simply use X1*X2 ~ Y Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861 Date: 17-Mar-09 Time: 23:31:21 ------------------------------ XFMail ------------------------------
Erin Hodgess wrote:> Dear R People: > > Here is a small data frame and two particular formulas: >> test.df > y x > 1 -0.9261650 1 > 2 1.5702700 2 > 3 0.1673920 3 > 4 0.7893085 4 > 5 0.3576875 5 > 6 -1.4620915 6 > 7 -0.5506215 7 > 8 -0.3480292 8 > 9 -1.2344036 9 > 10 0.8502660 10 >> summary(lm(exp(y)~x)) > > Call: > lm(formula = exp(y) ~ x) > > Residuals: > Min 1Q Median 3Q Max > -1.6360 -0.6435 -0.4722 0.4215 2.9127 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 2.1689 0.9782 2.217 0.0574 . > x -0.1368 0.1577 -0.868 0.4108 > --- > Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 > > Residual standard error: 1.432 on 8 degrees of freedom > Multiple R-squared: 0.08604, Adjusted R-squared: -0.0282 > F-statistic: 0.7532 on 1 and 8 DF, p-value: 0.4108 > >> summary(lm(I(y^2)~x)) > > Call: > lm(formula = I(y^2) ~ x) > > Residuals: > Min 1Q Median 3Q Max > -0.9584 -0.6387 -0.2651 0.5754 1.4412 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) 1.10084 0.62428 1.763 0.116 > x -0.03813 0.10061 -0.379 0.715 > > Residual standard error: 0.9138 on 8 degrees of freedom > Multiple R-squared: 0.01764, Adjusted R-squared: -0.1052 > F-statistic: 0.1436 on 1 and 8 DF, p-value: 0.7146 > > > These both work just fine. > > My question is: when do you know to use I() and just the function of > the variable, please?I don't think you need I() on the LHS, at least nowadays > lm(y^2~x) Call: lm(formula = y^2 ~ x) Coefficients: (Intercept) x 0.8708 -0.7787 > lm(I(y^2)~x) Call: lm(formula = I(y^2) ~ x) Coefficients: (Intercept) x 0.8708 -0.7787 on the RHS you use I() to prevent special treatment of model formula operators e.g. (a+b+c)^2 == a:b+b:c+a:c is all 2 factor interactions between a,b,c whereas I(a+b+c)^2) is a variable which is the squared sum of a,b, and c. -- O__ ---- Peter Dalgaard ?ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Seemingly Similar Threads
- [PATCH net-next] vhost_net: stop rx net polling when possible
- [PATCH net-next] vhost_net: stop rx net polling when possible
- Inaccurate complex arithmetic of R (Matlab is accurate)
- iphone connection problem
- [PATCH net-next] vhost_net: stop rx net polling when possible