Hi R users! I'm new to R, so I'm starting with a basic exercise in rpart. I'm predicting if a user will churn based on past order history. I've calculated the probabilities in excel, and if user is a single order customer (1), then their probability of churn is 90%, if there are multiple orders(0) then the probability of churning is 70%. In the R model, the probability looks like it's 100% and 53%. In excel I used the count of shopper_key to calculate probabilities. So I'm wondering if R has needs a shopper_key to count? It would be helpful if someone could suggest where I'm going wrong. Thank you! Code - m1 <- rpart( churn ~ single_order , data = data2, method="anova" ) Output- n= 22041 node), split, n, deviance, yval * denotes terminal node 1) root 22041 3229.265 0.8216959 2) single_order< 0.5 8407 2092.852 0.5325324 * 3) single_order>=0.5 13634 0.000 1.0000000 * shopper_key churn single_order 1 1 0 2 1 1 3 0 0 4 1 0 5 1 1 6 1 1 7 1 0 8 1 1 9 0 1 10 1 1 [[alternative HTML version deleted]]
1. Forget Excel. Erase it from your memory. banish its paradigms from your practices. Faiing to do so will only bring misery as you explore R. R is a rational programming language primarily for data analysis, statistics, and graphics. Excel is, ummm, not. 2. Have you read the rpart documents and vignettes? That should be your first port of call for questions about how it works. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, May 23, 2017 at 6:45 PM, kristen wissmar <wissmar.kristen at gmail.com> wrote:> Hi R users! > > I'm new to R, so I'm starting with a basic exercise in rpart. > > I'm predicting if a user will churn based on past order history. I've > calculated the probabilities in excel, and if user is a single order > customer (1), then their probability of churn is 90%, if there are multiple > orders(0) then the probability of churning is 70%. In the R model, the > probability looks like it's 100% and 53%. In excel I used the count of > shopper_key to calculate probabilities. So I'm wondering if R has needs a > shopper_key to count? > > It would be helpful if someone could suggest where I'm going wrong. > > Thank you! > > > Code - > m1 <- rpart( churn ~ single_order , data = data2, method="anova" ) > > Output- > n= 22041 > > node), split, n, deviance, yval > * denotes terminal node > > 1) root 22041 3229.265 0.8216959 > 2) single_order< 0.5 8407 2092.852 0.5325324 * > 3) single_order>=0.5 13634 0.000 1.0000000 * > > > shopper_key churn single_order > 1 1 0 > 2 1 1 > 3 0 0 > 4 1 0 > 5 1 1 > 6 1 1 > 7 1 0 > 8 1 1 > 9 0 1 > 10 1 1 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 24/05/17 14:38, Bert Gunter wrote:> Forget Excel. Erase it from your memory. banish its paradigms from > your practices. Faiing to do so will only bring misery as you explore > R. R is a rational programming language primarily for data analysis, > statistics, and graphics. Excel is, ummm, not.Gotta be a fortune!!! cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
> On 24 May 2017, at 04:38 , Bert Gunter <bgunter.4567 at gmail.com> wrote: > > 1. Forget Excel. Erase it from your memory. banish its paradigms from > your practices. Faiing to do so will only bring misery as you explore > R. R is a rational programming language primarily for data analysis, > statistics, and graphics. Excel is, ummm, not.And, never mind Bert's rant, a simple table(single_order, churn) would give info similar to what you claim to have from Excel, minus the risk of finding that the data are not the same, or that Excel was doing something bizarre. -pd> > 2. Have you read the rpart documents and vignettes? That should be > your first port of call for questions about how it works. > > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Tue, May 23, 2017 at 6:45 PM, kristen wissmar > <wissmar.kristen at gmail.com> wrote: >> Hi R users! >> >> I'm new to R, so I'm starting with a basic exercise in rpart. >> >> I'm predicting if a user will churn based on past order history. I've >> calculated the probabilities in excel, and if user is a single order >> customer (1), then their probability of churn is 90%, if there are multiple >> orders(0) then the probability of churning is 70%. In the R model, the >> probability looks like it's 100% and 53%. In excel I used the count of >> shopper_key to calculate probabilities. So I'm wondering if R has needs a >> shopper_key to count? >> >> It would be helpful if someone could suggest where I'm going wrong. >> >> Thank you! >> >> >> Code - >> m1 <- rpart( churn ~ single_order , data = data2, method="anova" ) >> >> Output- >> n= 22041 >> >> node), split, n, deviance, yval >> * denotes terminal node >> >> 1) root 22041 3229.265 0.8216959 >> 2) single_order< 0.5 8407 2092.852 0.5325324 * >> 3) single_order>=0.5 13634 0.000 1.0000000 * >> >> >> shopper_key churn single_order >> 1 1 0 >> 2 1 1 >> 3 0 0 >> 4 1 0 >> 5 1 1 >> 6 1 1 >> 7 1 0 >> 8 1 1 >> 9 0 1 >> 10 1 1 >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com