BSanders

2011-Feb-10 04:48 UTC

### [R] Newb Prediction Question using stepAIC and predict(), is R wrong?

I'm using stepAIC to fit a model. Then I'm trying to use that model to predict future happenings. My first few variables are labeled as their column. (Is this a problem?) The dataframe that I use to build the model is the same as the data I'm using to predict with. Here is a portion of what is happening.. This is the value it is predicting = > [1] 9.482975 Summary of the model Call: lm(formula = reservesub$paid ~ reservesub[, 3 + i] + reservesub$grads[, i] + reservesub$Sun + reservesub$Fri + reservesub$Sat) Residuals: Min 1Q Median 3Q Max -15.447 -4.993 -1.090 3.910 27.454 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.71370 1.46449 3.902 0.000149 *** reservesub[, 3 + i] 1.00868 0.01643 61.391 < 2e-16 *** reservesub$grads[, i] 0.44649 0.12131 3.681 0.000333 *** reservesub$Sun 8.63606 1.95100 4.426 1.93e-05 *** reservesub$Fri 3.76928 2.00079 1.884 0.061682 . reservesub$Sat 4.03103 2.12754 1.895 0.060225 . --- Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 Residual standard error: 7.842 on 138 degrees of freedom (131 observations deleted due to missingness) Multiple R-squared: 0.9794, Adjusted R-squared: 0.9787 F-statistic: 1312 on 5 and 138 DF, p-value: < 2.2e-16 Here is the data that is being fed into predicted[p] predict.(stepsaicguess[[p]], newdata = reservesubpred[p,]) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 paid Mon Tue Wed Thu 276 10/3/2010 155 84 76 68 64 63 53 42 42 42 42 38 38 38 35 31 31 NA 84 0 0 0 0 Fri Sat Sun grads.1 grads.2 grads.3 grads.4 grads.5 grads.6 grads.7 0 0 1 8 4 1 10 11 0 0 grads.8 grads.9 grads.10 grads.11 grads.12 grads.13 grads.14 0 4 0 0 3 4 0 In this case, i = 1, so I calculate the predicted value should be 5.7137+1.00868*84+.44649*8+1*8.636+0*3.769+0*4.03=102 But, R is giving me 9.482975 for a predicted value .. (Which, interestingly is 5.7137+3.769*1) (Intercept+Sat) Another question I have is, if I were to include interactions in this model, would I have to make those variables in my prediction dataframe, or would R 'know' what to do? Thanks in advance for your expert assistance. -- View this message in context: http://r.789695.n4.nabble.com/Newb-Prediction-Question-using-stepAIC-and-predict-is-R-wrong-tp3298569p3298569.html Sent from the R help mailing list archive at Nabble.com.

Bill.Venables at csiro.au

2011-Feb-10 06:49 UTC

### [R] Newb Prediction Question using stepAIC and predict(), is R wrong?

Using complex names, like res[, 3+i] or res$var, in the formula for a model is a very bad idea, especially if eventually you want eventualluy to predict to new data. (In fact it won't work, so that makes is very bad indeed.) So do not use '$' or '[..]' terms in model formulae - this is going to cause problems when it comes to predict, because your formula will not associate with the names it has in its formula in the new data frame. When you think about it, this is obvious. In your case you will have to identify the actual names and build the formula that way. So your model will be fitted with a call something like fm <- lm(paid ~ x3i + xi + Sun + Fri + Sat, data = reservesub) (but you will have to use the real names for the first two, of course). If you are doing this in some kind of loop, there are ways to handle it without using terms such as reservesub[, 3+i] but they are not all that simple. Still, if you want to predict from the model to new data, there is no way round it. Interactions are inculded generally with the * or the / linear model operators. Bill Venables. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of BSanders Sent: Thursday, 10 February 2011 2:49 PM To: r-help at r-project.org Subject: [R] Newb Prediction Question using stepAIC and predict(), is R wrong? I'm using stepAIC to fit a model. Then I'm trying to use that model to predict future happenings. My first few variables are labeled as their column. (Is this a problem?) The dataframe that I use to build the model is the same as the data I'm using to predict with. Here is a portion of what is happening.. This is the value it is predicting = > [1] 9.482975 Summary of the model Call: lm(formula = reservesub$paid ~ reservesub[, 3 + i] + reservesub$grads[, i] + reservesub$Sun + reservesub$Fri + reservesub$Sat) Residuals: Min 1Q Median 3Q Max -15.447 -4.993 -1.090 3.910 27.454 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.71370 1.46449 3.902 0.000149 *** reservesub[, 3 + i] 1.00868 0.01643 61.391 < 2e-16 *** reservesub$grads[, i] 0.44649 0.12131 3.681 0.000333 *** reservesub$Sun 8.63606 1.95100 4.426 1.93e-05 *** reservesub$Fri 3.76928 2.00079 1.884 0.061682 . reservesub$Sat 4.03103 2.12754 1.895 0.060225 . --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 7.842 on 138 degrees of freedom (131 observations deleted due to missingness) Multiple R-squared: 0.9794, Adjusted R-squared: 0.9787 F-statistic: 1312 on 5 and 138 DF, p-value: < 2.2e-16 Here is the data that is being fed into predicted[p] predict.(stepsaicguess[[p]], newdata = reservesubpred[p,]) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 paid Mon Tue Wed Thu 276 10/3/2010 155 84 76 68 64 63 53 42 42 42 42 38 38 38 35 31 31 NA 84 0 0 0 0 Fri Sat Sun grads.1 grads.2 grads.3 grads.4 grads.5 grads.6 grads.7 0 0 1 8 4 1 10 11 0 0 grads.8 grads.9 grads.10 grads.11 grads.12 grads.13 grads.14 0 4 0 0 3 4 0 In this case, i = 1, so I calculate the predicted value should be 5.7137+1.00868*84+.44649*8+1*8.636+0*3.769+0*4.03=102 But, R is giving me 9.482975 for a predicted value .. (Which, interestingly is 5.7137+3.769*1) (Intercept+Sat) Another question I have is, if I were to include interactions in this model, would I have to make those variables in my prediction dataframe, or would R 'know' what to do? Thanks in advance for your expert assistance. -- View this message in context: http://r.789695.n4.nabble.com/Newb-Prediction-Question-using-stepAIC-and-predict-is-R-wrong-tp3298569p3298569.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.