Try the following:
set.seed(100)
rf1 <- randomForest(Species ~ ., data=iris)
set.seed(100)
rf2 <- randomForest(iris[1:4], iris$Species)
object.size(rf1)
object.size(rf2)
str(rf1)
str(rf2)
You can try it on your own data. That should give you some hints about why the
formula interface should be avoided with large datasets.
Andy
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of John Foreman
Sent: Monday, December 03, 2012 3:43 PM
To: r-help at r-project.org
Subject: [R] How do I make R randomForest model size smaller?
I've been training randomForest models on 7 million rows of data (41
features). Here's an example call:
myModel <- randomForest(RESPONSE~., data=mydata, ntree=50, maxnodes=30)
I thought surely with only 50 trees and 30 terminal nodes that the memory
footprint of "myModel" would be small. But it's 65 megs in a dump
file. The
object seems to be holding all sorts of predicted, actual, and vote data
from the training process.
What if I just want the forest and that's it? I want a tiny dump file that
I can load later to make predictions off of quickly. I feel like the forest
by itself shouldn't be all that large...
Anyone know how to strip this sucker down to just something I can make
predictions off of going forward?
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Notice: This e-mail message, together with any attachme...{{dropped:11}}