Lorenzo Isella

2013-Mar-24 10:43 UTC

### [R] Random Forest, Giving More Importance to Some Data

Dear All, I am using randomForest to predict the final selling price of some items. As it often happens, I have a lot of (noisy) historical data, but the question is not so much about data cleaning. The dataset for which I need to carry out some predictions are fairly recent sales or even some sales that will took place in the near future. As a consequence, historical data should be somehow weighted: the older they are, the less they should matter for the prediction. Any idea about how this could be achieved? Please find below a snippet showing how I use the randomForest library (on a multi-core machine). Any suggestion is appreciated. Cheers Lorenzo ########################################################################### rf_model <- foreach(iteration=1:cores, ntree = rep(50, 4), .combine = combine, .packages = "randomForest") %dopar%{ sink("log.txt", append=TRUE) cat(paste("Starting iteration",iteration,"\n")) randomForest(trainRF, prices_train, ## mtry=20, nodesize=5, ## maxnodes=140, importance=FALSE, do.trace=10,ntree=ntree) ###########################################################################

your question doesn't seem to specifically related to either R or random forest. instead, it is about how to assign weights to training observations. On Sun, Mar 24, 2013 at 6:43 AM, Lorenzo Isella <lorenzo.isella@gmail.com>wrote:> Dear All, > I am using randomForest to predict the final selling price of some items. > As it often happens, I have a lot of (noisy) historical data, but the > question is not so much about data cleaning. > The dataset for which I need to carry out some predictions are fairly > recent sales or even some sales that will took place in the near future. > As a consequence, historical data should be somehow weighted: the older > they are, the less they should matter for the prediction. > Any idea about how this could be achieved? > Please find below a snippet showing how I use the randomForest library (on > a multi-core machine). > Any suggestion is appreciated. > Cheers > > Lorenzo > > ##############################**##############################** > ############### > rf_model <- foreach(iteration=1:cores, > ntree = rep(50, 4), > .combine = combine, > .packages = "randomForest") %dopar%{ > sink("log.txt", append=TRUE) > cat(paste("Starting iteration",iteration,"\n")) > randomForest(trainRF, > prices_train, ## mtry=20, > nodesize=5, > ## maxnodes=140, > importance=FALSE, do.trace=10,ntree=ntree) > ##############################**##############################** > ############### > > ______________________________**________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. >-- =============================WenSui Liu Credit Risk Manager, 53 Bancorp wensui.liu@53.com 513-295-4370 ============================= [[alternative HTML version deleted]]