Soumyadeep Nandi
2008-Jul-03 06:41 UTC
[R] randomForest.error: length of response must be the same as predictors
My data looks like: A,B,C,D,Class 1,2,0,2,cl1 1,5,1,9,cl1 3,2,1,2,cl2 7,2,1,2,cl2 2,2,1,2,cl2 1,2,1,5,cl2 0,2,1,2,cl2 4,2,1,2,cl2 3,5,1,2,cl2 3,2,12,3,cl2 3,2,4,2,cl2 **The steps followed are: trainfile <- read.csv("TrainFile",head=TRUE) datatrain <- subset(trainfile,select=c(-Class)) classtrain <- (subset(trainfile,select=Class)) rf <- randomForest(datatrain, classtrain) Error in randomForest.default(classtrain, datatrain) : length of response must be the same as predictors In addition: Warning message: In randomForest.default(classtrain, datatrain) : The response has five or fewer unique values. Are you sure you want to do regression? Could someone suggest me where I am going wrong. Thanks [[alternative HTML version deleted]]
Gavin Simpson
2008-Jul-03 08:50 UTC
[R] randomForest.error: length of response must be the same as predictors
On Thu, 2008-07-03 at 12:11 +0530, Soumyadeep Nandi wrote:> My data looks like: > A,B,C,D,Class > 1,2,0,2,cl1 > 1,5,1,9,cl1 > 3,2,1,2,cl2 > 7,2,1,2,cl2 > 2,2,1,2,cl2 > 1,2,1,5,cl2 > 0,2,1,2,cl2 > 4,2,1,2,cl2 > 3,5,1,2,cl2 > 3,2,12,3,cl2 > 3,2,4,2,cl2 > > **The steps followed are: > trainfile <- read.csv("TrainFile",head=TRUE) > datatrain <- subset(trainfile,select=c(-Class)) > classtrain <- (subset(trainfile,select=Class)) > rf <- randomForest(datatrain, classtrain) > > Error in randomForest.default(classtrain, datatrain) : > length of response must be the same as predictors > In addition: Warning message: > In randomForest.default(classtrain, datatrain) : > The response has five or fewer unique values. Are you sure you want to do > regression? > > Could someone suggest me where I am going wrong.Yep, look at class(classtrain):> class(classtrain)[1] "data.frame" subset() returns a data.frame, which is a special case of a list. The lengths of a list (and therefore a data frame) are not what you expect:> length(classtrain)[1] 1 There is *1* component to the list, one '$' bit that you can get at. Hence, rf complains as, to it, the length of x and y are not the same, when evaluated using length(). Note that ?randomForest does state that y should be a response 'vector', so you are not supplying what is required. Two ways to proceed: rf <- randomForest(Class ~ ., data = trainfile) or if you really don't want the formula parsing, force the empty dimension to be dropped, by subsetting: rf <- randomForest(datatrain, classtrain[,1]) [Nb, as classtrain is of class "data.frame", drop() will not work on it as it doesn't have a dim attribute] HTH G> > Thanks > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.