LE TERTRE Alain
2007-Feb-13 16:58 UTC
[R] Missing variable in new dataframe for prediction
Hi, I'm using a loop to evaluate several models by taking adjacent variables from my dataframe. When i try to get predictions for new values, i get an error message about a missing variable in my new dataframe. Below is an example adapted from ?gam in mgcv package library(mgcv) set.seed(0) n<-400 sig<-2 x0 <- runif(n, 0, 1) x1 <- runif(n, 0, 1) x2 <- runif(n, 0, 1) x3 <- runif(n, 0, 1) f0 <- function(x) 2 * sin(pi * x) f1 <- function(x) exp(2 * x) f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 f3 <- function(x) 0*x f <- f0(x0) + f1(x1) + f2(x2) e <- rnorm(n, 0, sig) y <- f + e Mydata<-data.frame(y=y,x0=x0,x1=x1,x2=x2,x3=x3) remove(list=c("y","x0","x1","x2","x3")) # Note below the syntax of the 3rd variable required for my loop for (i in 4:5){ b<-gam(y~s(x0)+ s(x1)+ ns(Mydata[,i], 3), data=Mydata) newd <- data.frame(x0=(0:399)/30,x1=(0:399)/30,x2=(0:399)/30,x3=(0:399)/30) pred <- predict.gam(b,newd) } Erreur dans model.frame(formula, rownames, variables, varnames, extras, extranames, : type (list) incorrect pour la variable 'Mydata' De plus : Warning message: not all required variables have been supplied in newdata! in: predict.gam(b, newd) #Defining the name for the variable as in the gam function doesn't solve the problem newd <- data.frame(x0=(0:399)/30,x1=(0:399)/30,x2=(0:399)/30,"Mydata[,i]"=(0:399)/30) Erreur dans model.frame(formula, rownames, variables, varnames, extras, extranames, : type (list) incorrect pour la variable 'Mydata' De plus : Warning message: not all required variables have been supplied in newdata! in: predict.gam(b, newd) How should i define my new dataset to be able to get my predictions ? Thanks in advance O__ ---- Alain Le Tertre c/ /'_ --- Institut de Veille Sanitaire (InVS)/ D?partement Sant? Environnement (*) \(*) -- Responsable de l'unit? Syst?mes d'Information & Statistiques ~~~~~~~~~~ - 12 rue du val d'Osne 94415 Saint Maurice cedex FRANCE Voice: 33 1 41 79 68 76 Fax: 33 1 41 79 67 68 email: a.letertre at invs.sante.fr
Gabor Grothendieck
2007-Feb-13 17:31 UTC
[R] Missing variable in new dataframe for prediction
The call to library(splines) is missing and also try replacing the line b <- ... with fo <- as.formula(sprintf("y ~ s(x0) + s(x1) + ns(%s, 3)", names(Mydata)[i])) b <- do.call("gam", list(fo, data = Mydata)) to dynamically recreate the formula on each iteration of the loop with the correct name, x2 or x3, inserted. On 2/13/07, LE TERTRE Alain <a.letertre at invs.sante.fr> wrote:> Hi, > I'm using a loop to evaluate several models by taking adjacent variables from my dataframe. > When i try to get predictions for new values, i get an error message about a missing variable in my new dataframe. > > Below is an example adapted from ?gam in mgcv package > library(mgcv) > set.seed(0) > n<-400 > sig<-2 > x0 <- runif(n, 0, 1) > x1 <- runif(n, 0, 1) > x2 <- runif(n, 0, 1) > x3 <- runif(n, 0, 1) > f0 <- function(x) 2 * sin(pi * x) > f1 <- function(x) exp(2 * x) > f2 <- function(x) 0.2*x^11*(10*(1-x))^6+10*(10*x)^3*(1-x)^10 > f3 <- function(x) 0*x > f <- f0(x0) + f1(x1) + f2(x2) > e <- rnorm(n, 0, sig) > y <- f + e > Mydata<-data.frame(y=y,x0=x0,x1=x1,x2=x2,x3=x3) > remove(list=c("y","x0","x1","x2","x3")) > > # Note below the syntax of the 3rd variable required for my loop > for (i in 4:5){ > b<-gam(y~s(x0)+ s(x1)+ ns(Mydata[,i], 3), data=Mydata) > > newd <- data.frame(x0=(0:399)/30,x1=(0:399)/30,x2=(0:399)/30,x3=(0:399)/30) > pred <- predict.gam(b,newd) > } > Erreur dans model.frame(formula, rownames, variables, varnames, extras, extranames, : > type (list) incorrect pour la variable 'Mydata' > De plus : Warning message: > not all required variables have been supplied in newdata! > in: predict.gam(b, newd) > > #Defining the name for the variable as in the gam function doesn't solve the problem > newd <- data.frame(x0=(0:399)/30,x1=(0:399)/30,x2=(0:399)/30,"Mydata[,i]"=(0:399)/30) > > Erreur dans model.frame(formula, rownames, variables, varnames, extras, extranames, : > type (list) incorrect pour la variable 'Mydata' > De plus : Warning message: > not all required variables have been supplied in newdata! > in: predict.gam(b, newd) > > How should i define my new dataset to be able to get my predictions ? > > Thanks in advance > > > O__ ---- Alain Le Tertre > c/ /'_ --- Institut de Veille Sanitaire (InVS)/ D?partement Sant? Environnement > (*) \(*) -- Responsable de l'unit? Syst?mes d'Information & Statistiques > ~~~~~~~~~~ - 12 rue du val d'Osne > 94415 Saint Maurice cedex FRANCE > Voice: 33 1 41 79 68 76 Fax: 33 1 41 79 67 68 > email: a.letertre at invs.sante.fr > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >