Sir, This query is related to randomForest regression using R. I have a dataset called qsar.arff which I use as my training set and then I run the following function - rf=randomForest(x=train,y=trainy,xtest=train,ytest=trainy,ntree=500) where train is a matrix of predictors without the column to be predicted(the target column), trainy is the target column.I feed the same data for xtest and ytest too as shown. On verifying I found, rf$mse[500] and rf$test$mse[500] are different(the r-squares are also different).The predicted values of the training target column and testing target column are also different. Should this happen , since I am using the training dataset as the testing dataset? I expected that the test and training predictions would be the same. It would be helpful if you could point out if I am missing something. Thanks for the help. Regards, Shameek Ghosh [[alternative HTML version deleted]]
On Mar 8, 2012, at 5:10 AM, shameek ghosh wrote:> Sir, > This query is related to randomForest regression using R. > > I have a dataset called qsar.arff which I use as my training set > and > then I run the following function - > > > rf=randomForest(x=train,y=trainy,xtest=train,ytest=trainy,ntree=500) > > where train is a matrix of predictors without the column to be > predicted(the target column), trainy is the target column.I feed the > same > data for xtest and ytest too as shown. > > On verifying I found, rf$mse[500] and rf$test$mse[500] are > different(the r-squares are also different).The predicted values of > the > training target column and testing target column are also different. > > Should this happen , since I am using the training dataset as the > testing dataset? I expected that the test and training predictions > would be > the same.My inference from its name _random_Forest, was that it was _not_ "deterministic forest". -- David Winsemius, MD West Hartford, CT
Sent from my HTC -----Original Message----- From: David Winsemius <dwinsemius at comcast.net> Sent: Friday, 9 March 2012 12:26 AM To: shameek ghosh <shameek09 at gmail.com> Cc: r-help at r-project.org <r-help at r-project.org> Subject: Re: [R] Regarding randomForest regression On Mar 8, 2012, at 5:10 AM, shameek ghosh wrote:> Sir, > This query is related to randomForest regression using R. > > I have a dataset called qsar.arff which I use as my training set > and > then I run the following function - > > > rf=randomForest(x=train,y=trainy,xtest=train,ytest=trainy,ntree=500) > > where train is a matrix of predictors without the column to be > predicted(the target column), trainy is the target column.I feed the > same > data for xtest and ytest too as shown. > > On verifying I found, rf$mse[500] and rf$test$mse[500] are > different(the r-squares are also different).The predicted values of > the > training target column and testing target column are also different. > > Should this happen , since I am using the training dataset as the > testing dataset? I expected that the test and training predictions > would be > the same.My inference from its name _random_Forest, was that it was _not_ "deterministic forest". -- David Winsemius, MD West Hartford, CT ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.