I'm trying to use boot.stepAIC for feature selection; I need to be able to specify the name of the dependent variable programmatically, but this appear to fail: In R-Studio with MS R Open 3.4: library(bootStepAIC) #Fake data n<-200 x1 <- runif(n, -3, 3) x2 <- runif(n, -3, 3) x3 <- runif(n, -3, 3) x4 <- runif(n, -3, 3) x5 <- runif(n, -3, 3) x6 <- runif(n, -3, 3) x7 <- runif(n, -3, 3) x8 <- runif(n, -3, 3) y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5) dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1) #the real data won't have these names... cn <- names(dat) trg <- "y1" xvars <- cn[cn!=trg] frm1<-as.formula(paste(trg,"~1")) frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+"))) strt=lm(y1~1,dat) # boot.stepAIC Works fine #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ## #strt=lm(frm1,dat) ## boot.stepAIC FAILS ## limit<-5 stp=stepAIC(strt,direction='forward',steps=limit, scope=list(lower=frm1,upper=frm2)) bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit, scope=list(lower=frm1,upper=frm2)) b1 <- bst$Covariates ball <- data.frame(b1) names(ball)=unlist(trg) Any ideas? Cheers, SOH [[alternative HTML version deleted]]
Failed? What was the error message? Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan <SOhagan at manchester.ac.uk> wrote:> I'm trying to use boot.stepAIC for feature selection; I need to be able to specify the name of the dependent variable programmatically, but this appear to fail: > > In R-Studio with MS R Open 3.4: > > library(bootStepAIC) > > #Fake data > n<-200 > > x1 <- runif(n, -3, 3) > x2 <- runif(n, -3, 3) > x3 <- runif(n, -3, 3) > x4 <- runif(n, -3, 3) > x5 <- runif(n, -3, 3) > x6 <- runif(n, -3, 3) > x7 <- runif(n, -3, 3) > x8 <- runif(n, -3, 3) > y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5) > > dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1) > #the real data won't have these names... > > cn <- names(dat) > trg <- "y1" > xvars <- cn[cn!=trg] > > frm1<-as.formula(paste(trg,"~1")) > frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+"))) > > strt=lm(y1~1,dat) # boot.stepAIC Works fine > > #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ## > > #strt=lm(frm1,dat) ## boot.stepAIC FAILS ## > > limit<-5 > > > stp=stepAIC(strt,direction='forward',steps=limit, > scope=list(lower=frm1,upper=frm2)) > > bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit, > scope=list(lower=frm1,upper=frm2)) > > b1 <- bst$Covariates > ball <- data.frame(b1) > names(ball)=unlist(trg) > > Any ideas? > > Cheers, > SOH > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
The error is "the model fit failed in 50 bootstrap samples Error: non-character argument" Cheers, SOH. On 22/08/2017 17:52, Bert Gunter wrote:> Failed? What was the error message? > > Cheers, > > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan > <SOhagan at manchester.ac.uk> wrote: >> I'm trying to use boot.stepAIC for feature selection; I need to be able to specify the name of the dependent variable programmatically, but this appear to fail: >> >> In R-Studio with MS R Open 3.4: >> >> library(bootStepAIC) >> >> #Fake data >> n<-200 >> >> x1 <- runif(n, -3, 3) >> x2 <- runif(n, -3, 3) >> x3 <- runif(n, -3, 3) >> x4 <- runif(n, -3, 3) >> x5 <- runif(n, -3, 3) >> x6 <- runif(n, -3, 3) >> x7 <- runif(n, -3, 3) >> x8 <- runif(n, -3, 3) >> y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5) >> >> dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1) >> #the real data won't have these names... >> >> cn <- names(dat) >> trg <- "y1" >> xvars <- cn[cn!=trg] >> >> frm1<-as.formula(paste(trg,"~1")) >> frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+"))) >> >> strt=lm(y1~1,dat) # boot.stepAIC Works fine >> >> #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ## >> >> #strt=lm(frm1,dat) ## boot.stepAIC FAILS ## >> >> limit<-5 >> >> >> stp=stepAIC(strt,direction='forward',steps=limit, >> scope=list(lower=frm1,upper=frm2)) >> >> bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit, >> scope=list(lower=frm1,upper=frm2)) >> >> b1 <- bst$Covariates >> ball <- data.frame(b1) >> names(ball)=unlist(trg) >> >> Any ideas? >> >> Cheers, >> SOH >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code.
OK, here's the problem. Continuing with your example: strt1 <- lm(y1 ~1, dat) strt2 <- lm(frm1,dat)> strt1Call: lm(formula = y1 ~ 1, data = dat) Coefficients: (Intercept) 41.73> strt2Call: lm(formula = frm1, data = dat) Coefficients: (Intercept) 41.73 Note that the formula objects of the lm object are different: strt2 does not evaluate the formula. So presumably boot.step.AIC does no evaluation and therefore gets confused with the errors you saw. So you need to get the evaluated formula into the lm object. This can be done, e.g. via:> strt2 <- eval(substitute(lm(form,data = dat), list(form = frm1)))## yielding> strt2Call: lm(formula = y1 ~ 1, data = dat) Coefficients: (Intercept) 41.73 So this looks like it should fix the problem, but alas no, the boot.stepAIC call still fails with the same error message. Here's why:> identical(strt$call, strt2$call)[1] FALSE So one might rightfully ask, what the heck is going on here?! Further digging:> str(strt$call)language lm(formula = y1 ~ 1, data = dat)> str(strt2$call)language lm(formula = y1 ~ 1, data = dat) These certainly look identical! -- but of course they're not:> names(strt$call)[1] "" "formula" "data"> names(strt2$call)[1] "" "formula" "data" So the difference must lie in the formula component, right? ...> strt$call$formulay1 ~ 1> strt2$call$formulay1 ~ 1 So, thus far, huhh? But..> class(strt2$call$formula)[1] "formula"> class(strt$call$formula)[1] "call" So I think therein lies the critical difference that is screwing things up. NOTE: If I am wrong about this someone **PLEASE** correct me. I see no clear workaround for this other than to explicitly avoid passing a formula in the lm() call with y~1 or y ~ . I think the real fix is to make the boot.stepAIC function smarter in how it handles its formula argument, and that is above my paygrade (and degree of interest) . You should probably email the maintainer, who may not monitor this list. But give it a day or so to give someone else a chance to correct me if I'm wrong. HTH. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan <SOhagan at manchester.ac.uk> wrote:> I'm trying to use boot.stepAIC for feature selection; I need to be able to specify the name of the dependent variable programmatically, but this appear to fail: > > In R-Studio with MS R Open 3.4: > > library(bootStepAIC) > > #Fake data > n<-200 > > x1 <- runif(n, -3, 3) > x2 <- runif(n, -3, 3) > x3 <- runif(n, -3, 3) > x4 <- runif(n, -3, 3) > x5 <- runif(n, -3, 3) > x6 <- runif(n, -3, 3) > x7 <- runif(n, -3, 3) > x8 <- runif(n, -3, 3) > y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5) > > dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1) > #the real data won't have these names... > > cn <- names(dat) > trg <- "y1" > xvars <- cn[cn!=trg] > > frm1<-as.formula(paste(trg,"~1")) > frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+"))) > > strt=lm(y1~1,dat) # boot.stepAIC Works fine > > #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ## > > #strt=lm(frm1,dat) ## boot.stepAIC FAILS ## > > limit<-5 > > > stp=stepAIC(strt,direction='forward',steps=limit, > scope=list(lower=frm1,upper=frm2)) > > bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit, > scope=list(lower=frm1,upper=frm2)) > > b1 <- bst$Covariates > ball <- data.frame(b1) > names(ball)=unlist(trg) > > Any ideas? > > Cheers, > SOH > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Until I get a fix that works, a work-around would be to rename the 'y1' column, used a fixed formula, and rename it back afterwards. Thanks for your help. SGO. -----Original Message----- From: Bert Gunter [mailto:bgunter.4567 at gmail.com] Sent: 22 August 2017 20:38 To: Stephen O'hagan <SOhagan at manchester.ac.uk> Cc: r-help at r-project.org Subject: Re: [R] boot.stepAIC fails with computed formula OK, here's the problem. Continuing with your example: strt1 <- lm(y1 ~1, dat) strt2 <- lm(frm1,dat)> strt1Call: lm(formula = y1 ~ 1, data = dat) Coefficients: (Intercept) 41.73> strt2Call: lm(formula = frm1, data = dat) Coefficients: (Intercept) 41.73 Note that the formula objects of the lm object are different: strt2 does not evaluate the formula. So presumably boot.step.AIC does no evaluation and therefore gets confused with the errors you saw. So you need to get the evaluated formula into the lm object. This can be done, e.g. via:> strt2 <- eval(substitute(lm(form,data = dat), list(form = frm1)))## yielding> strt2Call: lm(formula = y1 ~ 1, data = dat) Coefficients: (Intercept) 41.73 So this looks like it should fix the problem, but alas no, the boot.stepAIC call still fails with the same error message. Here's why:> identical(strt$call, strt2$call)[1] FALSE So one might rightfully ask, what the heck is going on here?! Further digging:> str(strt$call)language lm(formula = y1 ~ 1, data = dat)> str(strt2$call)language lm(formula = y1 ~ 1, data = dat) These certainly look identical! -- but of course they're not:> names(strt$call)[1] "" "formula" "data"> names(strt2$call)[1] "" "formula" "data" So the difference must lie in the formula component, right? ...> strt$call$formulay1 ~ 1> strt2$call$formulay1 ~ 1 So, thus far, huhh? But..> class(strt2$call$formula)[1] "formula"> class(strt$call$formula)[1] "call" So I think therein lies the critical difference that is screwing things up. NOTE: If I am wrong about this someone **PLEASE** correct me. I see no clear workaround for this other than to explicitly avoid passing a formula in the lm() call with y~1 or y ~ . I think the real fix is to make the boot.stepAIC function smarter in how it handles its formula argument, and that is above my paygrade (and degree of interest) . You should probably email the maintainer, who may not monitor this list. But give it a day or so to give someone else a chance to correct me if I'm wrong. HTH. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Aug 22, 2017 at 8:17 AM, Stephen O'hagan <SOhagan at manchester.ac.uk> wrote:> I'm trying to use boot.stepAIC for feature selection; I need to be able to specify the name of the dependent variable programmatically, but this appear to fail: > > In R-Studio with MS R Open 3.4: > > library(bootStepAIC) > > #Fake data > n<-200 > > x1 <- runif(n, -3, 3) > x2 <- runif(n, -3, 3) > x3 <- runif(n, -3, 3) > x4 <- runif(n, -3, 3) > x5 <- runif(n, -3, 3) > x6 <- runif(n, -3, 3) > x7 <- runif(n, -3, 3) > x8 <- runif(n, -3, 3) > y1 <- 42+x3 + 2*x6 + 3*x8 + runif(n, -0.5, 0.5) > > dat <- data.frame(x1,x2,x3,x4,x5,x6,x7,x8,y1) > #the real data won't have these names... > > cn <- names(dat) > trg <- "y1" > xvars <- cn[cn!=trg] > > frm1<-as.formula(paste(trg,"~1")) > frm2<-as.formula(paste(trg,"~ 1 + ",paste(xvars,collapse = "+"))) > > strt=lm(y1~1,dat) # boot.stepAIC Works fine > > #strt=do.call("lm",list(frm1,data=dat)) ## boot.stepAIC FAILS ## > > #strt=lm(frm1,dat) ## boot.stepAIC FAILS ## > > limit<-5 > > > stp=stepAIC(strt,direction='forward',steps=limit, > scope=list(lower=frm1,upper=frm2)) > > bst <- boot.stepAIC(strt,dat,B=50,alpha=0.05,direction='forward',steps=limit, > scope=list(lower=frm1,upper=frm2)) > > b1 <- bst$Covariates > ball <- data.frame(b1) > names(ball)=unlist(trg) > > Any ideas? > > Cheers, > SOH > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.