thr3ads.net - R help - [R] how to use the randomForest and rpart function? [Mar 2006]

If this information is useful, please help other people find it:
Share via:

Michael

2006-Mar-08 01:39 UTC

[R] how to use the randomForest and rpart function?

Hi all,

I am trying to play around with the randomForest function for
classification. I know its performance is great.

I am currently using the default options.

It has many options.

How do I further tweak the options so that I can make its performance even
better?

What are the options that are mostly used?

Thanks a lot!

M

	[[alternative HTML version deleted]]

Michael

2006-Mar-08 01:44 UTC

head link

[R] how to use the randomForest and rpart function?

When I plot the randomForest object, it shows a graph with 3 lines, green,
red and black, what's the meaning of these three lines?

On 3/7/06, Michael <comtech.usa@gmail.com> wrote:>
> Hi all,
>
> I am trying to play around with the randomForest function for
> classification. I know its performance is great.
>
> I am currently using the default options.
>
> It has many options.
>
> How do I further tweak the options so that I can make its performance even
> better?
>
> What are the options that are mostly used?
>
> Thanks a lot!
>
> M
>
	[[alternative HTML version deleted]]

Liaw, Andy

2006-Mar-08 02:00 UTC

head link

[R] how to use the randomForest and rpart function?

As ?plot.randomForest says, it plots error rates.  In addition to overall
error rates, it also plots error rates for each class.

As to the options in randomForest, read about the options in the help page
and the reference linked from the help page.

Andy

From: Michael> 
> When I plot the randomForest object, it shows a graph with 3 
> lines, green, red and black, what's the meaning of these three lines?
> 
> On 3/7/06, Michael <comtech.usa at gmail.com> wrote:
> >
> > Hi all,
> >
> > I am trying to play around with the randomForest function for 
> > classification. I know its performance is great.
> >
> > I am currently using the default options.
> >
> > It has many options.
> >
> > How do I further tweak the options so that I can make its 
> performance 
> > even better?
> >
> > What are the options that are mostly used?
> >
> > Thanks a lot!
> >
> > M
> >
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
>

Liaw, Andy

2006-Mar-08 02:31 UTC

head link

[R] how to use the randomForest and rpart function?

Yes, I do know.  That's why I pointed you to the reference linked from the
help page.

BTW, there's also an R News article describing the initial version of the
package.  Have you perused that?

Andy

-----Original Message-----
From: Michael [mailto:comtech.usa@gmail.com] 
Sent: Tuesday, March 07, 2006 9:27 PM
To: Liaw, Andy
Cc: R-help@stat.math.ethz.ch
Subject: Re: [R] how to use the randomForest and rpart function?

It did not have a legend showing on which color is for class1, which color
is for class2, etc...

I've read the R-help page.

It lists a lot of options, but it did not say which ones are the key
parameters that people use most for improving performance... 

Do you know?

On 3/7/06, Liaw, Andy <andy_liaw@merck.com <mailto:andy_liaw@merck.com>
>
wrote: 

As ?plot.randomForest says, it plots error rates.  In addition to overall
error rates, it also plots error rates for each class.

As to the options in randomForest, read about the options in the help page
and the reference linked from the help page. 

Andy

From: Michael>
> When I plot the randomForest object, it shows a graph with 3
> lines, green, red and black, what's the meaning of these three lines?
>
> On 3/7/06, Michael < comtech.usa@gmail.com
<mailto:comtech.usa@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I am trying to play around with the randomForest function for
> > classification. I know its performance is great. 
> >
> > I am currently using the default options.
> >
> > It has many options.
> >
> > How do I further tweak the options so that I can make its
> performance
> > even better?
> >
> > What are the options that are mostly used?
> >
> > Thanks a lot!
> >
> > M
> >
>
>       [[alternative HTML version deleted]] 
>
> ______________________________________________
> R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch>  mailing
list
> https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
<http://www.R-project.org/posting-guide.html> >
>

----------------------------------------------------------------------------
-- 
Notice:  This e-mail message, together with any attachments,...{{dropped}}

Michael

2006-Mar-09 01:20 UTC

head link

[R] how to use the randomForest and rpart function?

Wow, I didn't know that. That's great! He the man!

On 3/8/06, Carlos Ortega <coforfe@gmail.com>
wrote:>
> Hello Michael,
>
> Just a few words about you phrase "Do you know?"...
>
> Andy Liaw, is the creator and maintainer of the randomForest package.
> He ported the original library of Briemman to R.
>
> Regards,
> Carlos.
>
>
> On 3/8/06, Michael <comtech.usa@gmail.com> wrote:
>
> > It did not have a legend showing on which color is for class1, which
> > color
> > is for class2, etc...
> >
> > I've read the R-help page.
> >
> > It lists a lot of options, but it did not say which ones are the key
> > parameters that people use most for improving performance...
> >
> > Do you know?
> >
> > On 3/7/06, Liaw, Andy < andy_liaw@merck.com> wrote:
> > >
> > > As ?plot.randomForest says, it plots error rates.  In addition to
> > overall
> > > error rates, it also plots error rates for each class.
> > >
> > > As to the options in randomForest, read about the options in the
help
> > page
> > > and the reference linked from the help page.
> > >
> > > Andy
> > >
> > > From: Michael
> > > >
> > > > When I plot the randomForest object, it shows a graph with 3
> > > > lines, green, red and black, what's the meaning of these
three
> > lines?
> > > >
> > > > On 3/7/06, Michael <comtech.usa@gmail.com> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I am trying to play around with the randomForest
function for
> > > > > classification. I know its performance is great.
> > > > >
> > > > > I am currently using the default options.
> > > > >
> > > > > It has many options.
> > > > >
> > > > > How do I further tweak the options so that I can make
its
> > > > performance
> > > > > even better?
> > > > >
> > > > > What are the options that are mostly used?
> > > > >
> > > > > Thanks a lot!
> > > > >
> > > > > M
> > > > >
> > > >
> > > >       [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help@stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide!
> > > > http://www.R-project.org/posting-guide.html
> > > >
> > > >
> > >
> > >
> > >
> > >
> >
------------------------------------------------------------------------------
> >
> > > Notice:  This e-mail message, together with any
> > attachment...{{dropped}}
> >
> > ______________________________________________
> > R-help@stat.math.ethz.ch mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> >
> >
>
>
	[[alternative HTML version deleted]]

Michael

2006-Mar-09 01:20 UTC

head link

[R] how to use the randomForest and rpart function?

Thanks a lot Joe!

I will take a further look at the article...

On 3/8/06, Joseph Retzer <joe_retzer@yahoo.com>
wrote:>
> Hi Michael,
>   I've looked into this a bit and the only parameter that seems to be
> suggested (by Brieman and a Salford Systems white paper) as one which may
> have an impact on the RF model is that which sets the number of potential
> split variables (mtry) for each tree split. The default for categorical
> response is root(total number of attributes) and (total number of
> attributes)/3 for regression. Take a look at the tuneRF function in
> randomForest which takes the default and searches above and below to see if
> the OOB error rate can be improved by changing mtry. Based on my very
> limited experimentation with the program, the default value seems to
> be tough to improve on.
> Best of luck & take care,
> Joe Retzer
>
> *Michael <comtech.usa@gmail.com>* wrote:
>
> It did not have a legend showing on which color is for class1, which color
> is for class2, etc...
>
> I've read the R-help page.
>
> It lists a lot of options, but it did not say which ones are the key
> parameters that people use most for improving performance...
>
> Do you know?
>
> On 3/7/06, Liaw, Andy wrote:
> >
> > As ?plot.randomForest says, it plots error rates. In addition to
overall
> > error rates, it also plots error rates for each class.
> >
> > As to the options in randomForest, read about the options in the help
> page
> > and the reference linked from the help page.
> >
> > Andy
> >
> > From: Michael
> > >
> > > When I plot the randomForest object, it shows a graph with 3
> > > lines, green, red and black, what's the meaning of these
three lines?
> > >
> > > On 3/7/06, Michael wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I am trying to play around with the randomForest function
for
> > > > classification. I know its performance is great.
> > > >
> > > > I am currently using the default options.
> > > >
> > > > It has many options.
> > > >
> > > > How do I further tweak the options so that I can make its
> > > performance
> > > > even better?
> > > >
> > > > What are the options that are mostly used?
> > > >
> > > > Thanks a lot!
> > > >
> > > > M
> > > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help@stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> >
> >
> >
> >
>
------------------------------------------------------------------------------
> > Notice: This e-mail message, together with any
attachment...{{dropped}}
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>
>
>
	[[alternative HTML version deleted]]

Tim Howard

2006-Mar-09 12:39 UTC

head link

[R] how to use the randomForest and rpart function?

Michael - 
I recall reading something Breiman wrote that said essentially "don't
skimp on the number of trees - they are cheap to build and it makes for a better
model."  Also, look at your error rates (using plot), and make sure you run
enough trees so that the error settles down. You'll likely be building 1000
or so trees.

Tim

************>Hi Andy,
Does the randomForest have a Cross Validation built-in to decide what is the
best number of trees or I have to find the best number manually by myself?

Thanks a lot!

Michael.

On 3/7/06, Liaw, Andy <andy_liaw at merck.com>
wrote:>
> Yes, I do know.  That's why I pointed you to the reference linked from
the
> help page.
>
> BTW, there's also an R News article describing the initial version of
the
> package.  Have you perused that?
>
> Andy

Michael

2006-Mar-11 01:11 UTC

head link

[R] how to use the randomForest and rpart function?

Thanks a lot Andy,

Do I need to have centering and scaling before sending data into rpart and
randomForest?

I knew for LDA and QDA, it does not matter...

And for ridge, it matters;

Thanks a lot!

Michael.



On 3/8/06, Liaw, Andy <andy_liaw@merck.com> wrote:>
>  The algorithm has something slicker than cross-validation.  That's the
> whole OOB business mentioned in the R News article.  The number of trees
> isn't really a parameter, as it doesn't hurt to have `too many
trees' (other
> than wasting computing resources).  Some people routinely run more than
> 10,000 trees just to make sure.
>
> Some times mtry does matter (though that's more of an exception than
the
> rule).  I can find pathological cases where mtry=1 is the best, or
> mtry=number of covariates (bagging) is best, but when given a real data,
one
> almost never have any idea.
>
> Andy
>
>  -----Original Message-----
> *From:* Michael [mailto:comtech.usa@gmail.com]
> *Sent:* Wednesday, March 08, 2006 8:22 PM
> *To:* Liaw, Andy
> *Cc:* R-help@stat.math.ethz.ch
> *Subject:* Re: [R] how to use the randomForest and rpart function?
>
>  Hi Andy,
>
> Does the randomForest have a Cross Validation built-in to decide what is
> the best number of trees or I have to find the best number manually by
> myself?
>
> Thanks a lot!
>
> Michael.
>
> On 3/7/06, Liaw, Andy <andy_liaw@merck.com> wrote:
> >
> > Yes, I do know.  That's why I pointed you to the reference linked
from
> > the help page.
> >
> > BTW, there's also an R News article describing the initial version
of
> > the package.  Have you perused that?
> >
> > Andy
> >
> >  -----Original Message-----
> > *From:* Michael [mailto:comtech.usa@gmail.com]
> > *Sent:* Tuesday, March 07, 2006 9:27 PM
> > *To:* Liaw, Andy
> > *Cc:* R-help@stat.math.ethz.ch
> > *Subject:* Re: [R] how to use the randomForest and rpart function?
> >
> > It did not have a legend showing on which color is for class1, which
> > color is for class2, etc...
> >
> > I've read the R-help page.
> >
> > It lists a lot of options, but it did not say which ones are the key
> > parameters that people use most for improving performance...
> >
> > Do you know?
> >
> > On 3/7/06, Liaw, Andy <andy_liaw@merck.com> wrote:
> > >
> > > As ?plot.randomForest says, it plots error rates.  In addition to
> > > overall
> > > error rates, it also plots error rates for each class.
> > >
> > > As to the options in randomForest, read about the options in the
help
> > > page
> > > and the reference linked from the help page.
> > >
> > > Andy
> > >
> > > From: Michael
> > > >
> > > > When I plot the randomForest object, it shows a graph with 3
> > > > lines, green, red and black, what's the meaning of these
three
> > > lines?
> > > >
> > > > On 3/7/06, Michael < comtech.usa@gmail.com> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I am trying to play around with the randomForest
function for
> > > > > classification. I know its performance is great.
> > > > >
> > > > > I am currently using the default options.
> > > > >
> > > > > It has many options.
> > > > >
> > > > > How do I further tweak the options so that I can make
its
> > > > performance
> > > > > even better?
> > > > >
> > > > > What are the options that are mostly used?
> > > > >
> > > > > Thanks a lot!
> > > > >
> > > > > M
> > > > >
> > > >
> > > >       [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help@stat.math.ethz.ch mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide!
> > > > http://www.R-project.org/posting-guide.html
> > > <http://www.r-project.org/posting-guide.html>
> > > >
> > > >
> > >
> > >
> > >
------------------------------------------------------------------------------
> > >
> > > Notice:  This e-mail message, together with any attachments,
contains
> > > information of Merck & Co., Inc. (One Merck Drive, Whitehouse
Station, New
> > > Jersey, USA 08889), and/or its affiliates (which may be known
outside the
> > > United States as Merck Frosst, Merck Sharp & Dohme or MSD and
in Japan, as
> > > Banyu) that may be confidential, proprietary copyrighted and/or
legally
> > > privileged. It is intended solely for the use of the individual
or entity
> > > named on this message.  If you are not the intended recipient,
and have
> > > received this message in error, please notify us immediately by
reply e-mail
> > > and then delete it from your system.
> > >
> > >
------------------------------------------------------------------------------
> > >
> >
> >
> >
------------------------------------------------------------------------------
> > Notice: This e-mail message, together with any attachments, contains
> > information of Merck & Co., Inc. (One Merck Drive, Whitehouse
Station, New
> > Jersey, USA 08889), and/or its affiliates (which may be known outside
the
> > United States as Merck Frosst, Merck Sharp & Dohme or MSD and in
Japan, as
> > Banyu) that may be confidential, proprietary copyrighted and/or
legally
> > privileged. It is intended solely for the use of the individual or
entity
> > named on this message. If you are not the intended recipient, and have
> > received this message in error, please notify us immediately by reply
e-mail
> > and then delete it from your system.
> >
> >
------------------------------------------------------------------------------
> >
>
>
>
------------------------------------------------------------------------------
>
> Notice: This e-mail message, together with any attachments...{{dropped}}

Liaw, Andy

2006-Mar-11 03:10 UTC

head link

[R] how to use the randomForest and rpart function?

No.  Tree-based methods are invariant to monotone transformations in the
predictors, because they only use ranks.  Rotation can matter, though.
 
Andy

-----Original Message-----
From: Michael [mailto:comtech.usa@gmail.com] 
Sent: Friday, March 10, 2006 8:11 PM
To: Liaw, Andy
Cc: R-help@stat.math.ethz.ch
Subject: Re: [R] how to use the randomForest and rpart function?


Thanks a lot Andy,
 
Do I need to have centering and scaling before sending data into rpart and
randomForest?
 
I knew for LDA and QDA, it does not matter...
 
And for ridge, it matters;
 
Thanks a lot!
 
Michael.


 
On 3/8/06, Liaw, Andy <andy_liaw@merck.com <mailto:andy_liaw@merck.com>
>
wrote: 

The algorithm has something slicker than cross-validation.  That's the whole
OOB business mentioned in the R News article.  The number of trees isn't
really a parameter, as it doesn't hurt to have `too many trees' (other
than
wasting computing resources).  Some people routinely run more than 10,000
trees just to make sure. 
 
Some times mtry does matter (though that's more of an exception than the
rule).  I can find pathological cases where mtry=1 is the best, or
mtry=number of covariates (bagging) is best, but when given a real data, one
almost never have any idea. 
 
Andy


-----Original Message-----
From: Michael [mailto:comtech.usa@gmail.com 
<mailto:comtech.usa@gmail.com>
] 

Sent: Wednesday, March 08, 2006 8:22 PM
To: Liaw, Andy
Cc: R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> 
Subject: Re: [R] how to use the randomForest and rpart function?



Hi Andy,

Does the randomForest have a Cross Validation built-in to decide what is the
best number of trees or I have to find the best number manually by myself? 

Thanks a lot!

Michael.


On 3/7/06, Liaw, Andy <andy_liaw@merck.com <mailto:andy_liaw@merck.com>
>
wrote: 

Yes, I do know.  That's why I pointed you to the reference linked from the
help page.
 
BTW, there's also an R News article describing the initial version of the
package.  Have you perused that?

 
Andy


-----Original Message-----
From: Michael [mailto:  <mailto:comtech.usa@gmail.com>
comtech.usa@gmail.com] 
Sent: Tuesday, March 07, 2006 9:27 PM
To: Liaw, Andy
Cc: R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch> 
Subject: Re: [R] how to use the randomForest and rpart function?


It did not have a legend showing on which color is for class1, which color
is for class2, etc...

I've read the R-help page.

It lists a lot of options, but it did not say which ones are the key
parameters that people use most for improving performance... 

Do you know?


On 3/7/06, Liaw, Andy <andy_liaw@merck.com <mailto:andy_liaw@merck.com>
>
wrote: 

As ?plot.randomForest says, it plots error rates.  In addition to overall
error rates, it also plots error rates for each class. 

As to the options in randomForest, read about the options in the help page
and the reference linked from the help page. 

Andy

From: Michael>
> When I plot the randomForest object, it shows a graph with 3 
> lines, green, red and black, what's the meaning of these three lines?
>
> On 3/7/06, Michael < comtech.usa@gmail.com
<mailto:comtech.usa@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > I am trying to play around with the randomForest function for
> > classification. I know its performance is great. 
> >
> > I am currently using the default options.
> >
> > It has many options.
> >
> > How do I further tweak the options so that I can make its
> performance
> > even better?
> >
> > What are the options that are mostly used?
> >
> > Thanks a lot!
> >
> > M
> >
>
>       [[alternative HTML version deleted]] 
>
> ______________________________________________
> R-help@stat.math.ethz.ch <mailto:R-help@stat.math.ethz.ch>  mailing
list
> https://stat.ethz.ch/mailman/listinfo/r-help
<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
<http://www.r-project.org/posting-guide.html> >
>

----------------------------------------------------------------------------
-- 
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
Jersey, USA 08889), and/or its affiliates (which may be known outside the
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
Banyu) that may be confidential, proprietary copyrighted and/or legally
privileged. It is intended solely for the use of the individual or entity
named on this message.  If you are not the intended recipient, and have
received this message in error, please notify us immediately by reply e-mail
and then delete it from your system. 
----------------------------------------------------------------------------
--



----------------------------------------------------------------------------
--
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
Jersey, USA 08889), and/or its affiliates (which may be known outside the
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
Banyu) that may be confidential, proprietary copyrighted and/or legally
privileged. It is intended solely for the use of the individual or entity
named on this message. If you are not the intended recipient, and have
received this message in error, please notify us immediately by reply e-mail
and then delete it from your system. 
----------------------------------------------------------------------------
--


----------------------------------------------------------------------------
--


Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (One Merck Drive, Whitehouse Station, New
Jersey, USA 08889), and/or its affiliates (which may be known outside the
United States as Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as
Banyu) that may be confidential, proprietary copyrighted and/or legally
privileged. It is intended solely for the use of the individual or entity
named on this message. If you are not the intended recipient, and have
received this message in error, please notify us immediately by reply e-mail
and then delete it from your system. 
----------------------------------------------------------------------------
--





------------------------------------------------------------------------------

------------------------------------------------------------------------------
	[[alternative HTML version deleted]]

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Mar 2006 - how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

[R] how to use the randomForest and rpart function?

Seemingly Similar Threads