LE TERTRE Alain
2007-Feb-13  16:58 UTC
[R] Missing variable in new dataframe for prediction
Hi,
I'm using a loop to evaluate several models by taking adjacent variables
from my dataframe.
When i try to get predictions for new values, i get an error message about a
missing variable in my new dataframe.
Below is an example adapted from ?gam in mgcv package
library(mgcv)
set.seed(0) 
n<-400
sig<-2
x0 <- runif(n, 0, 1)
x1 <- runif(n, 0, 1)
x2 <- runif(n, 0, 1)
x3 <- runif(n, 0, 1)
f0 <- function(x) 2 * sin(pi * x)
f1 <- function(x) exp(2 * x)
f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
f3 <- function(x) 0*x
f <- f0(x0) + f1(x1) + f2(x2)
e <- rnorm(n, 0, sig)
y <- f + e
Mydata<-data.frame(y=y,x0=x0,x1=x1,x2=x2,x3=x3)
remove(list=c("y","x0","x1","x2","x3"))
# Note below the syntax of the 3rd variable required for my loop
for (i in 4:5){
 b<-gam(y~s(x0)+ s(x1)+ ns(Mydata[,i], 3), data=Mydata)
newd <- data.frame(x0=(0:399)/30,x1=(0:399)/30,x2=(0:399)/30,x3=(0:399)/30)
pred <- predict.gam(b,newd)
}
Erreur dans model.frame(formula, rownames, variables, varnames, extras,
extranames,  :
        type (list) incorrect pour la variable 'Mydata'
De plus : Warning message:
not all required variables have been supplied in  newdata!
 in: predict.gam(b, newd) 
#Defining the name for the variable as in the gam function doesn't solve the
problem
 newd <-
data.frame(x0=(0:399)/30,x1=(0:399)/30,x2=(0:399)/30,"Mydata[,i]"=(0:399)/30)
Erreur dans model.frame(formula, rownames, variables, varnames, extras,
extranames,  :
        type (list) incorrect pour la variable 'Mydata'
De plus : Warning message:
not all required variables have been supplied in  newdata!
 in: predict.gam(b, newd) 
How should i define my new dataset to be able to get my predictions ?
Thanks in advance
O__ ---- Alain Le Tertre
 c/ /'_ --- Institut de Veille Sanitaire (InVS)/ D?partement Sant?
Environnement
(*) \(*) -- Responsable de l'unit? Syst?mes d'Information &
Statistiques
~~~~~~~~~~ - 12 rue du val d'Osne 
94415 Saint Maurice cedex FRANCE
Voice: 33 1 41 79 68 76 Fax: 33 1 41 79 67 68
email: a.letertre at invs.sante.fr
Gabor Grothendieck
2007-Feb-13  17:31 UTC
[R] Missing variable in new dataframe for prediction
The call to library(splines) is missing and also try replacing the
line b <- ... with
 fo <- as.formula(sprintf("y ~ s(x0) + s(x1) + ns(%s, 3)",
names(Mydata)[i]))
 b <- do.call("gam", list(fo, data = Mydata))
to dynamically recreate the formula on each iteration of the loop
with the correct name, x2 or x3, inserted.
On 2/13/07, LE TERTRE Alain <a.letertre at invs.sante.fr>
wrote:> Hi,
> I'm using a loop to evaluate several models by taking adjacent
variables from my dataframe.
> When i try to get predictions for new values, i get an error message about
a missing variable in my new dataframe.
>
> Below is an example adapted from ?gam in mgcv package
> library(mgcv)
> set.seed(0)
> n<-400
> sig<-2
> x0 <- runif(n, 0, 1)
> x1 <- runif(n, 0, 1)
> x2 <- runif(n, 0, 1)
> x3 <- runif(n, 0, 1)
> f0 <- function(x) 2 * sin(pi * x)
> f1 <- function(x) exp(2 * x)
> f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10
> f3 <- function(x) 0*x
> f <- f0(x0) + f1(x1) + f2(x2)
> e <- rnorm(n, 0, sig)
> y <- f + e
> Mydata<-data.frame(y=y,x0=x0,x1=x1,x2=x2,x3=x3)
>
remove(list=c("y","x0","x1","x2","x3"))
>
> # Note below the syntax of the 3rd variable required for my loop
> for (i in 4:5){
>  b<-gam(y~s(x0)+ s(x1)+ ns(Mydata[,i], 3), data=Mydata)
>
> newd <-
data.frame(x0=(0:399)/30,x1=(0:399)/30,x2=(0:399)/30,x3=(0:399)/30)
> pred <- predict.gam(b,newd)
> }
> Erreur dans model.frame(formula, rownames, variables, varnames, extras,
extranames,  :
>        type (list) incorrect pour la variable 'Mydata'
> De plus : Warning message:
> not all required variables have been supplied in  newdata!
>  in: predict.gam(b, newd)
>
> #Defining the name for the variable as in the gam function doesn't
solve the problem
>  newd <-
data.frame(x0=(0:399)/30,x1=(0:399)/30,x2=(0:399)/30,"Mydata[,i]"=(0:399)/30)
>
> Erreur dans model.frame(formula, rownames, variables, varnames, extras,
extranames,  :
>        type (list) incorrect pour la variable 'Mydata'
> De plus : Warning message:
> not all required variables have been supplied in  newdata!
>  in: predict.gam(b, newd)
>
> How should i define my new dataset to be able to get my predictions ?
>
> Thanks in advance
>
>
> O__ ---- Alain Le Tertre
>  c/ /'_ --- Institut de Veille Sanitaire (InVS)/ D?partement Sant?
Environnement
> (*) \(*) -- Responsable de l'unit? Syst?mes d'Information &
Statistiques
> ~~~~~~~~~~ - 12 rue du val d'Osne
> 94415 Saint Maurice cedex FRANCE
> Voice: 33 1 41 79 68 76 Fax: 33 1 41 79 67 68
> email: a.letertre at invs.sante.fr
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>