thr3ads.net - R help - [R] Recursive partitioning algorithms in R vs. alia [Jun 2009]

If this information is useful, please help other people find it:
Share via:

Carlos J. Gil Bellosta

2009-Jun-19 18:35 UTC

[R] Recursive partitioning algorithms in R vs. alia

Dear R-helpers,

I had a conversation with a guy working in a "business intelligence"
department at a major Spanish bank. They rely on recursive partitioning
methods to rank customers according to certain criteria. 

They use both SAS EM and Salford Systems' CART. I have used package R
part in the past, but I could not provide any kind of feature comparison
or the like as I have no access to any installation of the first two
proprietary products.

Has anybody experience with them? Is there any public benchmark
available? Is there any very good --although solely technical-- reason
to pay hefty software licences? How would the algorithms implemented in
rpart compare to those in SAS and/or CART?

Best regards,

Carlos J. Gil Bellosta
http://www.datanalytics.com

Peter Flom

2009-Jun-19 19:13 UTC

head link

[R] Recursive partitioning algorithms in R vs. alia

"Carlos J. Gil Bellosta" <cgb at datanalytics.com>
wrote>
>I had a conversation with a guy working in a "business
intelligence"
>department at a major Spanish bank. They rely on recursive partitioning
>methods to rank customers according to certain criteria. 
>
>They use both SAS EM and Salford Systems' CART. I have used package R
>part in the past, but I could not provide any kind of feature comparison
>or the like as I have no access to any installation of the first two
>proprietary products.
>
>Has anybody experience with them? Is there any public benchmark
>available? Is there any very good --although solely technical-- reason
>to pay hefty software licences? How would the algorithms implemented in
>rpart compare to those in SAS and/or CART?
>
>Best regards,
>
Hi

I've used CART and a few different R packages - tree, rpart, rparty.

I can't comment on the algorithms - I'm not qualified to judge, and I
think
the ones in CART are proprietary.

One big difference is that the output from CART is beautiful with
minimal fuss.  Presentation quality, multicolor, multipage tree diagrams
with the default settings.

Another was speed - I am not sure I was doing everything right in R, but
for one problem I had that had about 500 variables, R was quite slow, and CART
blitzed through it.

Another big difference is the price.  I got CART for a reasonable fee, as 
I was working at a university, but the commercial price is very high (well into
the thousands of dollars, if I recall correctly).

Peter


Peter L. Flom, PhD
Statistical Consultant
www DOT peterflomconsulting DOT com

Wensui Liu

2009-Jun-19 19:23 UTC

head link

[R] Recursive partitioning algorithms in R vs. alia

in terms of the richness of features and ability to handle large
data(which is normal in bank), SAS EM should be on top of others.
however, it is not cheap.
in terms of algorithm, split procedure in sas em can do
chaid/cart/c4.5, if i remember correctly.

On Fri, Jun 19, 2009 at 2:35 PM, Carlos J. Gil
Bellosta<cgb at datanalytics.com> wrote:> Dear R-helpers,
>
> I had a conversation with a guy working in a "business
intelligence"
> department at a major Spanish bank. They rely on recursive partitioning
> methods to rank customers according to certain criteria.
>
> They use both SAS EM and Salford Systems' CART. I have used package R
> part in the past, but I could not provide any kind of feature comparison
> or the like as I have no access to any installation of the first two
> proprietary products.
>
> Has anybody experience with them? Is there any public benchmark
> available? Is there any very good --although solely technical-- reason
> to pay hefty software licences? How would the algorithms implemented in
> rpart compare to those in SAS and/or CART?
>
> Best regards,
>
> Carlos J. Gil Bellosta
> http://www.datanalytics.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
=============================WenSui Liu
Blog   : statcompute.spaces.live.com
Tough Times Never Last. But Tough People Do.  - Robert Schuller

jude.ryan at ubs.com

2009-Jun-22 22:56 UTC

head link

[R] Recursive partitioning algorithms in R vs. alia

I have used all 3 packages for decision trees (SAS/EM, CART and R). As
another user on the list commented, the algorithms CART uses are
proprietary. I also know that since the algorithms are proprietary, the
decision tree that you get from SAS is based on a "slightly different"
algorithm so as to not violate copyright laws. When I first started
using R (rpart) I benchmarked it (in terms of results obtained) for my
particular problem at the time against Salford Systems CART. R gave me
an identical tree with the splitting value being different in the 2nd or
3rd decimal place from what I recall. I did not have SAS/EM at that
particular company and so could not benchmark it. Salford Systems CART
does have additional types of splitting criteria such as "towing"
etc.,
but again, these may be of value in certain types of problems. The
splitting criteria found in R are good enough. 

 

I do have SAS/EM right now but prefer R to SAS/EM since R can be
programmed and SAS/EM cannot. This may not be relevant for decision
trees but for neural networks, for example, if I want to build hundreds
of neural networks (since there are no variable selection methods for
neural networks) with different predictors and different number of
neurons, I can do this easily in R but cannot do this in SAS/EM. SAS/EM
does have a variable selection node but that is independent of the
neural network node, so, from what I understand, you have to select the
variables and then pass them to the neural network node.

 

In general, you get "prettier" output with CART and SAS/EM for trees.
However, there are packages in R that can give you prettier output than
rpart does. One GUI that you may want to explore, that works with R, is
Rattle. This builds trees, neural network, boosting, etc. and you can
see the generated R code as well.

 

In terms of handling large volumes of data, SAS/EM is probably the best.
However, if you have a 64 bit operating system with lots of RAM, and use
random sampling, R should suffice. It is debatable whether the extra
features like pretty output and variable importance is worth the huge
costs you have to pay for those products, unless you really need these
features. With R you can do what you want, and that is build a good
tree. From what I have read, variable importance measures can be biased
as they are affected by factors such as multicollinearity, variables
with many categories, etc., so their usefulness is questionable
(however, end-users may love them).

 

SAS/EM is by far the most expensive product, and Salford Systems CART is
pretty expensive as well. So depending on your needs, R may be good
enough or the best, because you can program it, and the latest
methodologies will always be implemented in R first. For comparisons of
the programming capabilities of SAS (macros) versus R you may want to
look at what Frank Harrell and Terry Thearneu (who wrote rpart) have to
say. Both are experts in SAS and R.

 

Hope this helps.

 

Jude

 

 

Carlos wrote:

 

Dear R-helpers,

 

I had a conversation with a guy working in a "business intelligence"

department at a major Spanish bank. They rely on recursive partitioning

methods to rank customers according to certain criteria. 

 

They use both SAS EM and Salford Systems' CART. I have used package R

part in the past, but I could not provide any kind of feature comparison

or the like as I have no access to any installation of the first two

proprietary products.

 

Has anybody experience with them? Is there any public benchmark

available? Is there any very good --although solely technical-- reason

to pay hefty software licences? How would the algorithms implemented in

rpart compare to those in SAS and/or CART?

 

Best regards,

 

Carlos J. Gil Bellosta

http://www.datanalytics.com <http://www.datanalytics.com/> 

 

 

___________________________________________
Jude Ryan
Director, Client Analytical Services
Strategy & Business Development
UBS Financial Services Inc.
1200 Harbor Boulevard, 4th Floor
Weehawken, NJ 07086-6791
Tel. 201-352-1935
Fax 201-272-2914
Email: jude.ryan at ubs.com



-------------- next part --------------
Please do not transmit orders or instructions regarding a UBS 
account electronically, including but not limited to e-mail, 
fax, text or instant messaging. The information provided in 
this e-mail or any attachments is not an official transaction 
confirmation or account statement. For your protection, do not 
include account numbers, Social Security numbers, credit card 
numbers, passwords or other non-public information in your e-mail. 
Because the information contained in this message may be privileged, 
confidential, proprietary or otherwise protected from disclosure, 
please notify us immediately by replying to this message and 
deleting it from your computer if you have received this 
communication in error. Thank you. 

UBS Financial Services Inc. 
UBS International Inc. 
UBS Financial Services Incorporated of Puerto Rico 
UBS AG

 
UBS reserves the right to retain all messages. Messages are protected
and accessed only in legally justified cases.

Terry Therneau

2009-Jun-23 13:21 UTC

head link

[R] Recursive partitioning algorithms in R vs. alia

A point of history:

  Both the commercial CART program and the rpart() function are based on the 
book Classification and Regression Trees (Breiman, Friedman, Olshen, Stone, 
1984).  As a reader/commentator on one of the early drafts I got to know the 
material well.  CART started as a large Fortran program written by Jerry 
Friedman which was the testing ground for the ideas in the book.  I had the code
at one time and made some modifications to it, but found it too frustrating to 
go very far with. Fortran is just too clumsy for a recursive task, and
Jerry's
ability to hold upteen variables in his head at once greater than mine -- the 
Fortran was a large monlithic block.  Salford Systems aquired rights to that 
code; I don't know whether any of the original lines remain in their
product.  I
had lots of conversations with their main programmer (15-20 years ago now) about
methods for speeding it up; mainly an interesting problem in optimal indexing.
    
   When rpart was first written it's output agreed with CART almost
entirely.
The only major difference was in surrogates: I pick the surrogate with the 
largest number of agreements, CART picked that with the greatest % agreement.  
This means that rpart favors variables with fewer missing values.  Since that 
point in time both codes have evolved.  I haven't had time to do important
work
on rpart in over a decade.  It' not surprising that the graphics and display
are
behind the curve, what's more surprising is that it still endures.
   
   Rpart is called "rpart" because the authors copyrighted the term
"CART" for
their program.  It was the best alternative name that I could come up with at 
the time.  I find it amusing that one consequence of their copyright choice is 
that I now see "recursive partitioning"  far more often than
"CART" as the
generic label for tree based methods.
   
   Terry T

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jun 2009 - Recursive partitioning algorithms in R vs. alia

[R] Recursive partitioning algorithms in R vs. alia

[R] Recursive partitioning algorithms in R vs. alia

[R] Recursive partitioning algorithms in R vs. alia

[R] Recursive partitioning algorithms in R vs. alia

[R] Recursive partitioning algorithms in R vs. alia

Seemingly Similar Threads