Dear R-people ... I'm a new user. I can't get predict.lm() to produce predictions for new independent data. There are some messages in archived help about this problem, but I still don't see my error after reviewing those. I understand that the new independent data must have the same name(s) as used when the model was made. In the example below, predict.lm produces the predictions for the original (model input) data plus a warning message. What I want is predictions for alternative data (in data frame DX in the example). Thanks, Chip Barnaby > D<-data.frame( X=seq(1:10)) > D$Y<-D$X+rnorm( 10) > D X Y 1 1 0.3811634 2 2 1.8770049 3 3 3.5253376 4 4 3.1851957 5 5 3.8088813 6 6 5.7333074 7 7 7.4896623 8 8 7.9394056 9 9 8.6683570 10 10 10.7480675 > lm<-lm( D$Y~D$X) > summary( lm) Call: lm(formula = D$Y ~ D$X) Residuals: Min 1Q Median 3Q Max -0.98812 -0.36354 -0.09808 0.48154 0.88288 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.58935 0.41680 -1.414 0.195 D$X 1.07727 0.06717 16.037 2.29e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.6101 on 8 degrees of freedom Multiple R-Squared: 0.9698, Adjusted R-squared: 0.9661 F-statistic: 257.2 on 1 and 8 DF, p-value: 2.293e-07 > DX<-data.frame( X=seq( 5.5, 11.5)) > DX X 1 5.5 2 6.5 3 7.5 4 8.5 5 9.5 6 10.5 7 11.5 > predict.lm( lm, DX) 1 2 3 4 5 6 7 0.4879174 1.5651887 2.6424600 3.7197313 4.7970026 5.8742739 6.9515453 8 9 10 8.0288166 9.1060879 10.1833592 Warning message: 'newdata' had 7 rows but variable(s) found have 10 rows > --------------------------------------------------------- Chip Barnaby cbarnaby at wrightsoft.com Vice President of Research Wrightsoft Corp. 781-862-8719 x118 voice 131 Hartwell Ave 781-861-2058 fax Lexington, MA 02421 www.wrightsoft.com
On 07/04/2008 5:57 PM, Chip Barnaby wrote:> Dear R-people ... > > I'm a new user. I can't get predict.lm() to produce predictions for > new independent data. There are some messages in archived help about > this problem, but I still don't see my error after reviewing > those. I understand that the new independent data must have the same > name(s) as used when the model was made. > > In the example below, predict.lm produces the predictions for the > original (model input) data plus a warning message. What I want is > predictions for alternative data (in data frame DX in the example). > > Thanks, > Chip Barnaby > > > D<-data.frame( X=seq(1:10)) > > D$Y<-D$X+rnorm( 10) > > D > X Y > 1 1 0.3811634 > 2 2 1.8770049 > 3 3 3.5253376 > 4 4 3.1851957 > 5 5 3.8088813 > 6 6 5.7333074 > 7 7 7.4896623 > 8 8 7.9394056 > 9 9 8.6683570 > 10 10 10.7480675 > > lm<-lm( D$Y~D$X) > > summary( lm) > > Call: > lm(formula = D$Y ~ D$X) > > Residuals: > Min 1Q Median 3Q Max > -0.98812 -0.36354 -0.09808 0.48154 0.88288 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.58935 0.41680 -1.414 0.195 > D$X 1.07727 0.06717 16.037 2.29e-07 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.6101 on 8 degrees of freedom > Multiple R-Squared: 0.9698, Adjusted R-squared: 0.9661 > F-statistic: 257.2 on 1 and 8 DF, p-value: 2.293e-07 > > > DX<-data.frame( X=seq( 5.5, 11.5)) > > DX > X > 1 5.5 > 2 6.5 > 3 7.5 > 4 8.5 > 5 9.5 > 6 10.5 > 7 11.5 > > predict.lm( lm, DX) > 1 2 3 4 5 6 7 > 0.4879174 1.5651887 2.6424600 3.7197313 4.7970026 5.8742739 6.9515453 > 8 9 10 > 8.0288166 9.1060879 10.1833592 > Warning message: > 'newdata' had 7 rows but variable(s) found have 10 rowsYour formula refers to D explicitly, so predict.lm will never look at DX. You need to do the fit as fit <- lm( Y~X, data=D) Duncan Murdoch
You called lm() with a predictor named ``D$X'' and called predict.lm() with a predictor name ``X''. Simplest remedy: Use fit <- lm(Y ~ X, data = D) Remark: Not a good idea to use ``D'' as the name of your data frame (``D'' is the name of a function --- derivative). Likewise don't use ``lm'' as the name of an object (result of fitting a model) --- ``lm'' is the name of a function, as well you know! No immediate harm will come, but there can be subtle consequences I believe, and anyhow it's confusing. cheers, Rolf Turner On 8/04/2008, at 9:57 AM, Chip Barnaby wrote:> Dear R-people ... > > I'm a new user. I can't get predict.lm() to produce predictions for > new independent data. There are some messages in archived help about > this problem, but I still don't see my error after reviewing > those. I understand that the new independent data must have the same > name(s) as used when the model was made. > > In the example below, predict.lm produces the predictions for the > original (model input) data plus a warning message. What I want is > predictions for alternative data (in data frame DX in the example). > > Thanks, > Chip Barnaby > >> D<-data.frame( X=seq(1:10)) >> D$Y<-D$X+rnorm( 10) >> D > X Y > 1 1 0.3811634 > 2 2 1.8770049 > 3 3 3.5253376 > 4 4 3.1851957 > 5 5 3.8088813 > 6 6 5.7333074 > 7 7 7.4896623 > 8 8 7.9394056 > 9 9 8.6683570 > 10 10 10.7480675 >> lm<-lm( D$Y~D$X) >> summary( lm) > > Call: > lm(formula = D$Y ~ D$X) > > Residuals: > Min 1Q Median 3Q Max > -0.98812 -0.36354 -0.09808 0.48154 0.88288 > > Coefficients: > Estimate Std. Error t value Pr(>|t|) > (Intercept) -0.58935 0.41680 -1.414 0.195 > D$X 1.07727 0.06717 16.037 2.29e-07 *** > --- > Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > > Residual standard error: 0.6101 on 8 degrees of freedom > Multiple R-Squared: 0.9698, Adjusted R-squared: 0.9661 > F-statistic: 257.2 on 1 and 8 DF, p-value: 2.293e-07 > >> DX<-data.frame( X=seq( 5.5, 11.5)) >> DX > X > 1 5.5 > 2 6.5 > 3 7.5 > 4 8.5 > 5 9.5 > 6 10.5 > 7 11.5 >> predict.lm( lm, DX) > 1 2 3 4 5 > 6 7 > 0.4879174 1.5651887 2.6424600 3.7197313 4.7970026 5.8742739 > 6.9515453 > 8 9 10 > 8.0288166 9.1060879 10.1833592 > Warning message: > 'newdata' had 7 rows but variable(s) found have 10 rows >> > > --------------------------------------------------------- > Chip Barnaby cbarnaby at wrightsoft.com > Vice President of Research > Wrightsoft Corp. 781-862-8719 x118 voice > 131 Hartwell Ave 781-861-2058 fax > Lexington, MA 02421 www.wrightsoft.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > and provide commented, minimal, self-contained, reproducible code.###################################################################### Attention:\ This e-mail message is privileged and confid...{{dropped:9}}