Andreas Wittmann
2009-Nov-24 19:24 UTC
[R] predict: remove columns with new levels automatically
Dear R-users, in the follwing thread http://tolstoy.newcastle.edu.au/R/help/03b/3322.html the problem how to remove rows for predict that contain levels which are not in the model. now i try to do this the other way round and want to remove columns (variables) in the model which will be later problematic with new levels for prediction. ## example: set.seed(0) x <- rnorm(9) y <- x + rnorm(9) training <- data.frame(x=x, y=y, z=c(rep("A", 3), rep("B", 3), rep("C", 3))) test <- data.frame(x=t<-rnorm(1), y=t+rnorm(1), z="D") lm1 <- lm(x ~ ., data=training) ## prediction does not work because the variable z has the new level "D" predict(lm1, test) ## solution: the variable z is removed from the model ## the prediction happens without using the information of variable z lm2 <- lm(x ~ y, data=training) predict(lm2, test) How can i autmatically recognice this and calculate according to this? Thanks Andreas
David Winsemius
2009-Nov-25 05:48 UTC
[R] predict: remove columns with new levels automatically
On Nov 24, 2009, at 2:24 PM, Andreas Wittmann wrote:> Dear R-users, > > in the follwing thread > > http://tolstoy.newcastle.edu.au/R/help/03b/3322.html > > the problem how to remove rows for predict that contain levels which > are not in the model. > > now i try to do this the other way round and want to remove columns > (variables) in the model which will be later problematic with new > levels for prediction. > > ## example: > set.seed(0) > x <- rnorm(9) > y <- x + rnorm(9) > > training <- data.frame(x=x, y=y, z=c(rep("A", 3), rep("B", 3), > rep("C", 3))) > test <- data.frame(x=t<-rnorm(1), y=t+rnorm(1), z="D") > > lm1 <- lm(x ~ ., data=training) > ## prediction does not work because the variable z has the new level > "D" > predict(lm1, test) > > ## solution: the variable z is removed from the model > ## the prediction happens without using the information of variable z > lm2 <- lm(x ~ y, data=training) > predict(lm2, test) > > How can i autmatically recognice this and calculate according to this?Let me get this straight. You want us to predict in advance (or more accurately design an algorithm that can see into the future and work around) any sort of newdata you might later construct???? -- David Winsemius, MD Heritage Laboratories West Hartford, CT