I'm trying to run "rfe" for variable selection in the caret
package, and am
getting an error. My data frame includes a dummy variable with 3 levels.
x <- chlDescr
y <- chl
#crate dummy variable
levels(x$State) <- c("AL","GA","FL")
dummy <- model.matrix(~State,x)
z <- cbind(dummy, x)
#remove State category variable
w <- z[,c(-4)]
subsets <- c(2:8)
ctrl<- rfeControl(functions = lmFuncs, method="cv", verbose=FALSE,
returnResamp = "final")
lmProfile <- rfe(w, y, sizes = subsets, rfeControl = ctrl)
Returns:
Error in `[.data.frame`(x, , retained, drop = FALSE) :
undefined columns selected
In addition: Warning message:
In predict.lm(object, x) :
prediction from a rank-deficient fit may be misleading
When I remove the dummy variables the function runs fine.
#remove State variable
Desc <- chlDescr[,-c(1)]
lmProfile <- rfe(Desc, y, sizes = subsets, rfeControl = ctrl)
Returns:
Recursive feature selection
Outer resamping method was 10 iterations of cross-validation.
Resampling performance over subset size:
Variables RMSE Rsquared RMSESD RsquaredSD Selected
1 0.2462 0.7454 0.09529 0.17362
2 0.2408 0.7680 0.07860 0.12543
3 0.2134 0.8285 0.06649 0.09043
4 0.2011 0.8609 0.03463 0.05928 *
5 0.2019 0.8622 0.03421 0.05675
6 0.2019 0.8622 0.03421 0.05675
Can lmFuncs handle dummy variables? How does it need to be modified so it
can?
I'm new at this so any help would be appreciated, thanks.
Reni
http://r.789695.n4.nabble.com/file/n3487861/chl.csv chl.csv
http://r.789695.n4.nabble.com/file/n3487861/chlDescr.csv chlDescr.csv
--
View this message in context:
http://r.789695.n4.nabble.com/Dummy-variables-using-rfe-in-caret-for-variable-selection-tp3487861p3487861.html
Sent from the R help mailing list archive at Nabble.com.