Hi, I have come across the following problem which seems to be a scoping issue but I'm at a loss to see why this is so or to find a good workaround. Suppose I have a function to get a prediction after model selection using the step function. step.pred <- function(dat, x0) { fit.model <- step(lm(y~., data=dat), trace=F) predict(fit.model, x0, se.fit=T) } This function works sometimes for example set.seed(1) X.1 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20)) y.1 <- 5+as.matrix(X.1[,1:2])%*%matrix(c(1,1))+rnorm(20) Xy.1 <- data.frame(X.1,y=y.1) x0.1 <- data.frame(x1=-1,x2=-1, x3=-1) step.pred(Xy.1, x0.1) $fit [1] 3.359540 $se.fit [1] 0.523629 $df [1] 16 $residual.scale [1] 1.093526 but most often it crashes as in set.seed(2) X.2 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20)) y.2 <- 5+as.matrix(X.2[,1:2])%*%matrix(c(1,1))+rnorm(20) Xy.2 <- data.frame(X.2,y=y.2) x0.2 <- data.frame(x1=-1,x2=-1, x3=-1) step.pred(Xy.2, x0.2) Error in model.frame.default(formula = y ~ x1 + x2, data = dat, drop.unused.levels = TRUE) : Object "dat" not found The difference seems to be that for the first dataset, step retains all three variables whereas for the second it drops one of them.> step(lm(y~.,data=Xy.1), trace=F)Call: lm(formula = y ~ x1 + x2 + x3, data = Xy.1) Coefficients: (Intercept) x1 x2 x3 4.8347 0.8937 1.0331 -0.4516> step(lm(y~.,data=Xy.2), trace=F)Call: lm(formula = y ~ x1 + x2, data = Xy.2) Coefficients: (Intercept) x1 x2 5.0802 0.9763 1.1369 One possible workaround is to explicitely assign the local variable dat in the .GlobalEnv as in step.pred1 <- function(dat, x0) { assign("dat",dat, envir=.GlobalEnv) fit.model <- step(lm(y~., data=dat), trace=F) predict(fit.model, x0, se.fit=T) } I don't like this method since it would overwrite anything else called dat in .GlobalEnv. I realize that I could give it an obscure name but the potential for damage still remains. Am I missing something obvious here? If not, is it possible to work around this problem in such a way that .GlobalEnv does not need to be touched? In S-Plus I would use assign("dat",dat, frame=1) which works but that is not available (AFAIK) in R. Is there something similar that I can use? I am using R 1.6.1 for Unix on a Sun Workstation. I know that I need to upgrade but our sysadmin doesn't regard it as priority! Thanks for any help you can give for this. Angelo ------------------------------------------------------------------ | Angelo J. Canty Email: cantya at mcmaster.ca | | Mathematics and Statistics Phone: (905) 525-9140 x 27079 | | McMaster University Fax : (905) 522-0935 | | 1280 Main St. W. | | Hamilton ON L8S 4K1 |
Angelo Canty wrote:> Hi, > > I have come across the following problem which seems to be a scoping > issue but I'm at a loss to see why this is so or to find a good > workaround. > > Suppose I have a function to get a prediction after model selection > using the step function. > > step.pred <- function(dat, x0) { > fit.model <- step(lm(y~., data=dat), trace=F) > predict(fit.model, x0, se.fit=T) > } > > This function works sometimes for example > > set.seed(1) > X.1 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20)) > y.1 <- 5+as.matrix(X.1[,1:2])%*%matrix(c(1,1))+rnorm(20) > Xy.1 <- data.frame(X.1,y=y.1) > x0.1 <- data.frame(x1=-1,x2=-1, x3=-1) > step.pred(Xy.1, x0.1) > > $fit > [1] 3.359540 > > $se.fit > [1] 0.523629 > > $df > [1] 16 > > $residual.scale > [1] 1.093526 > > but most often it crashes as inIt does not crash. Your outdated version reports an error. R-1.8.0 does not. So please upgrade! Uwe Ligges> set.seed(2) > X.2 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20)) > y.2 <- 5+as.matrix(X.2[,1:2])%*%matrix(c(1,1))+rnorm(20) > Xy.2 <- data.frame(X.2,y=y.2) > x0.2 <- data.frame(x1=-1,x2=-1, x3=-1) > step.pred(Xy.2, x0.2) > Error in model.frame.default(formula = y ~ x1 + x2, data = dat, > drop.unused.levels = TRUE) : > Object "dat" not found > > The difference seems to be that for the first dataset, step retains > all three variables whereas for the second it drops one of them. > > >>step(lm(y~.,data=Xy.1), trace=F) > > > Call: > lm(formula = y ~ x1 + x2 + x3, data = Xy.1) > > Coefficients: > (Intercept) x1 x2 x3 > 4.8347 0.8937 1.0331 -0.4516 > > >>step(lm(y~.,data=Xy.2), trace=F) > > > Call: > lm(formula = y ~ x1 + x2, data = Xy.2) > > Coefficients: > (Intercept) x1 x2 > 5.0802 0.9763 1.1369 > > > One possible workaround is to explicitely assign the local variable > dat in the .GlobalEnv as in > > step.pred1 <- function(dat, x0) { > assign("dat",dat, envir=.GlobalEnv) > fit.model <- step(lm(y~., data=dat), trace=F) > predict(fit.model, x0, se.fit=T) > } > > I don't like this method since it would overwrite anything else called > dat in .GlobalEnv. I realize that I could give it an obscure name but > the potential for damage still remains. Am I missing something obvious > here? If not, is it possible to work around this problem in such a way > that .GlobalEnv does not need to be touched? > > In S-Plus I would use > assign("dat",dat, frame=1) > which works but that is not available (AFAIK) in R. Is there > something similar that I can use? > > I am using R 1.6.1 for Unix on a Sun Workstation. I know that I need > to upgrade but our sysadmin doesn't regard it as priority!>> Thanks for any help you can give for this. > Angelo > > ------------------------------------------------------------------ > | Angelo J. Canty Email: cantya at mcmaster.ca | > | Mathematics and Statistics Phone: (905) 525-9140 x 27079 | > | McMaster University Fax : (905) 522-0935 | > | 1280 Main St. W. | > | Hamilton ON L8S 4K1 | > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help
I believe this was fixed in 1.7.0, although I'm not sure of all the details. The NEWS file says o step(), add1.default() and drop1.default() now work somewhat better if called from a function. -roger Angelo Canty wrote:>Hi, > >I have come across the following problem which seems to be a scoping >issue but I'm at a loss to see why this is so or to find a good >workaround. > >Suppose I have a function to get a prediction after model selection >using the step function. > >step.pred <- function(dat, x0) { > fit.model <- step(lm(y~., data=dat), trace=F) > predict(fit.model, x0, se.fit=T) >} > >This function works sometimes for example > >set.seed(1) >X.1 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20)) >y.1 <- 5+as.matrix(X.1[,1:2])%*%matrix(c(1,1))+rnorm(20) >Xy.1 <- data.frame(X.1,y=y.1) >x0.1 <- data.frame(x1=-1,x2=-1, x3=-1) >step.pred(Xy.1, x0.1) > >$fit >[1] 3.359540 > >$se.fit >[1] 0.523629 > >$df >[1] 16 > >$residual.scale >[1] 1.093526 > >but most often it crashes as in > >set.seed(2) >X.2 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20)) >y.2 <- 5+as.matrix(X.2[,1:2])%*%matrix(c(1,1))+rnorm(20) >Xy.2 <- data.frame(X.2,y=y.2) >x0.2 <- data.frame(x1=-1,x2=-1, x3=-1) >step.pred(Xy.2, x0.2) >Error in model.frame.default(formula = y ~ x1 + x2, data = dat, >drop.unused.levels = TRUE) : > Object "dat" not found > >The difference seems to be that for the first dataset, step retains >all three variables whereas for the second it drops one of them. > > > >>step(lm(y~.,data=Xy.1), trace=F) >> >> > >Call: >lm(formula = y ~ x1 + x2 + x3, data = Xy.1) > >Coefficients: >(Intercept) x1 x2 x3 > 4.8347 0.8937 1.0331 -0.4516 > > > >>step(lm(y~.,data=Xy.2), trace=F) >> >> > >Call: >lm(formula = y ~ x1 + x2, data = Xy.2) > >Coefficients: >(Intercept) x1 x2 > 5.0802 0.9763 1.1369 > > >One possible workaround is to explicitely assign the local variable >dat in the .GlobalEnv as in > >step.pred1 <- function(dat, x0) { > assign("dat",dat, envir=.GlobalEnv) > fit.model <- step(lm(y~., data=dat), trace=F) > predict(fit.model, x0, se.fit=T) >} > >I don't like this method since it would overwrite anything else called >dat in .GlobalEnv. I realize that I could give it an obscure name but >the potential for damage still remains. Am I missing something obvious >here? If not, is it possible to work around this problem in such a way >that .GlobalEnv does not need to be touched? > >In S-Plus I would use >assign("dat",dat, frame=1) >which works but that is not available (AFAIK) in R. Is there >something similar that I can use? > >I am using R 1.6.1 for Unix on a Sun Workstation. I know that I need >to upgrade but our sysadmin doesn't regard it as priority! > >Thanks for any help you can give for this. >Angelo > >------------------------------------------------------------------ >| Angelo J. Canty Email: cantya at mcmaster.ca | >| Mathematics and Statistics Phone: (905) 525-9140 x 27079 | >| McMaster University Fax : (905) 522-0935 | >| 1280 Main St. W. | >| Hamilton ON L8S 4K1 | > >______________________________________________ >R-help at stat.math.ethz.ch mailing list >https://www.stat.math.ethz.ch/mailman/listinfo/r-help > > >
Thanks to everyone who replied to my message last night and sorry for any confusion in my use of the term "crashes". I meant that the function returns an error not that the program crashes. Anyway it seems that this problem was fixed in release 1.7.0 so it's back to annoying the sysadmin until he updates to R 1.8.0 Thanks also to those who pointed out that I could use a local version of R. This is true but the problem actually arose in the context of a student's project and she is limited by disk quotas etc. I suspect that it would not be so feasible for her to run a her own local version of R but I will explore the option. Thanks, Angelo ------------------------------------------------------------------ | Angelo J. Canty Email: cantya at mcmaster.ca | | Mathematics and Statistics Phone: (905) 525-9140 x 27079 | | McMaster University Fax : (905) 522-0935 | | 1280 Main St. W. | | Hamilton ON L8S 4K1 | ------------------------------------------------------------------ On Wed, 15 Oct 2003, Angelo Canty wrote:> Hi, > > I have come across the following problem which seems to be a scoping > issue but I'm at a loss to see why this is so or to find a good > workaround. > > Suppose I have a function to get a prediction after model selection > using the step function. > > step.pred <- function(dat, x0) { > fit.model <- step(lm(y~., data=dat), trace=F) > predict(fit.model, x0, se.fit=T) > } > > This function works sometimes for example > > set.seed(1) > X.1 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20)) > y.1 <- 5+as.matrix(X.1[,1:2])%*%matrix(c(1,1))+rnorm(20) > Xy.1 <- data.frame(X.1,y=y.1) > x0.1 <- data.frame(x1=-1,x2=-1, x3=-1) > step.pred(Xy.1, x0.1) > > $fit > [1] 3.359540 > > $se.fit > [1] 0.523629 > > $df > [1] 16 > > $residual.scale > [1] 1.093526 > > but most often it crashes as in > > set.seed(2) > X.2 <- data.frame(x1=rnorm(20), x2=rnorm(20), x3=rnorm(20)) > y.2 <- 5+as.matrix(X.2[,1:2])%*%matrix(c(1,1))+rnorm(20) > Xy.2 <- data.frame(X.2,y=y.2) > x0.2 <- data.frame(x1=-1,x2=-1, x3=-1) > step.pred(Xy.2, x0.2) > Error in model.frame.default(formula = y ~ x1 + x2, data = dat, > drop.unused.levels = TRUE) : > Object "dat" not found > > The difference seems to be that for the first dataset, step retains > all three variables whereas for the second it drops one of them. > > > step(lm(y~.,data=Xy.1), trace=F) > > Call: > lm(formula = y ~ x1 + x2 + x3, data = Xy.1) > > Coefficients: > (Intercept) x1 x2 x3 > 4.8347 0.8937 1.0331 -0.4516 > > > step(lm(y~.,data=Xy.2), trace=F) > > Call: > lm(formula = y ~ x1 + x2, data = Xy.2) > > Coefficients: > (Intercept) x1 x2 > 5.0802 0.9763 1.1369 > > > One possible workaround is to explicitely assign the local variable > dat in the .GlobalEnv as in > > step.pred1 <- function(dat, x0) { > assign("dat",dat, envir=.GlobalEnv) > fit.model <- step(lm(y~., data=dat), trace=F) > predict(fit.model, x0, se.fit=T) > } > > I don't like this method since it would overwrite anything else called > dat in .GlobalEnv. I realize that I could give it an obscure name but > the potential for damage still remains. Am I missing something obvious > here? If not, is it possible to work around this problem in such a way > that .GlobalEnv does not need to be touched? > > In S-Plus I would use > assign("dat",dat, frame=1) > which works but that is not available (AFAIK) in R. Is there > something similar that I can use? > > I am using R 1.6.1 for Unix on a Sun Workstation. I know that I need > to upgrade but our sysadmin doesn't regard it as priority! > > Thanks for any help you can give for this. > Angelo > > ------------------------------------------------------------------ > | Angelo J. Canty Email: cantya at mcmaster.ca | > | Mathematics and Statistics Phone: (905) 525-9140 x 27079 | > | McMaster University Fax : (905) 522-0935 | > | 1280 Main St. W. | > | Hamilton ON L8S 4K1 | > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >