thr3ads.net - R help - [R] How to define proper breaks in RFM analysis [Oct 2017]

If this information is useful, please help other people find it:
Share via:

David Winsemius

2017-Oct-13 17:27 UTC

[R] How to define proper breaks in RFM analysis

> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pikal at precheza.cz>
wrote:
> 
> Hi
> 
> You expect us to solve your problem but you ignore advice already recieved.
> 
> Your data are unreadable, use dput(yourdata) instead. see ?dput
> 
>> test<-read.table("clipboard", heade=T)
> Error in scan(file = file, what = what, sep = sep, quote = quote, dec =
dec,  :
>  line 115 did not have 6 elements
I didn't have such a problem: (illustrated with a more minimal example)

dat <-  scan( what=list("",1,"",1L,1L,1), 
             text="194849 6.99 8/22/2017 9 5 9.996
194978 14.78 8/28/2017 3 15 16.308
198614 18.44 7/31/2017 31 1 18.44
234569 34.99 8/20/2017 11 8 13.5075
252686 7.99 7/31/2017 31 2 7.99
291719 21.26 8/25/2017 6 2 15.67
291787 46.1 8/31/2017 0 2 32.57
292630 24.34 7/31/2017 31 1 24.34
295204 21.86 7/18/2017 44 1 21.86
295989 8.98 8/20/2017 11 2 14.095
298883 14.38 8/24/2017 7 2 11.185
308824 10.77 7/31/2017 31 1 10.77")

names(dat) <- c("user_id", "subtotal_amount",
"created_at", "Recency", "Frequency",
"Monetary")
dat <- data.frame(dat,stringsAsFactors=FALSE)

I suspect read.table would also have worked for me, but I was expecting
difficulties based on Petr's posting.


#And ended up with this result (on the original copied
data):> str(dat)'data.frame':	500 obs. of  6 variables:
 $ user_id        : chr  "194849" "194978"
"198614" "234569" ...
 $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
 $ created_at     : chr  "8/22/2017" "8/28/2017"
"7/31/2017" "8/20/2017" ...
 $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
 $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
 $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...

...  but the following criticism seems, well, _critical_ (as in essential for
one to address if a reasonable proposal is to be offered.)

> What is ?ideal interval? can you define it? Should it be such to provide
eqal number of observations?
That is the crucial question for you to answer, Hemant. Read the ?quartile help
page if your answer is "yes" or even
"maybe".> 
> Or maybe you could normalise your values and use quartile method.
Well, maybe not so much on that last one, Petr. Normalization should not affect
the classification based on quartiles. It doesn't change the ordering of
variables.

-- 
David.
> 
> Cheers
> Petr
> 
> From: Hemant Sain [mailto:hemantsain55 at gmail.com]
> Sent: Friday, October 13, 2017 8:51 AM
> To: PIKAL Petr <petr.pikal at precheza.cz>
> Cc: r-help mailing list <r-help at r-project.org>
> Subject: Re: [R] How to define proper breaks in RFM analysis
> 
> Hey,
> i want to define 3 ideal breaks (bin) for each variable one of those
variables is attached in the previous email,
> i don't want to consider quartile method because quartile is not
working ideally for that data set because data distribution is non normal.
> so i want you to suggest another method so that i can define 3 breaks with
the ideal interval for Recency, frequency and monetary to calculate RFM score.
> i'm again attaching you some of the data set.
> please look into it and help me with the R code.
> Thanks
> 
> 
> 
> Data
> 
> user_id
> 
> subtotal_amount
> 
> created_at
> 
> Recency
> 
> Frequency
> 
> Monetary
> 
> 194849
> 
> 6.99
> 
> 8/22/2017
> snipped
> 
> 
> On 13 October 2017 at 10:35, PIKAL Petr <petr.pikal at
precheza.cz<mailto:petr.pikal at precheza.cz>> wrote:
> Hi
> 
> Your statement about attaching data is problematic. We cannot do much with
it. Instead use output from dput(yourdata) to show us what exactly your data
look like.
> 
> We also do not know how do you want to split your data. It would be nice if
you can show also what should be the bins with respective data. Unless you
provide this information you probably would not get any sensible answer.
> 
> Cheers
> Petr
> 
> 
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at
r-project.org<mailto:r-help-bounces at r-project.org>] On Behalf Of Hemant
Sain
>> Sent: Thursday, October 12, 2017 10:18 AM
>> To: r-help mailing list <r-help at r-project.org<mailto:r-help at
r-project.org>>
>> Subject: [R] How to define proper breaks in RFM analysis
>> 
>> Hello,
>> I'm working on RFM analysis and i wanted to define my own breaks
but my
>> frequency distribution is not normally distributed so when I'm
using quartile its
>> not giving the optimal results.
>> so I'm looking for a better approach where i can define breaks
dynamically
>> because after visualization i can do it easily but i want to apply this
model so
>> that it can automatically define the breaks according to data set.
>> I'm attaching sample data for reference.
>> 
>> Thanks
>> 
>>                           *Freq*
>> 5
>> 15
>> 1
snipped> .
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

'Any technology distinguishable from magic is insufficiently advanced.' 
-Gehm's Corollary to Clarke's Third Law

Jim Lemon

2017-Oct-13 22:54 UTC

head link

[R] How to define proper breaks in RFM analysis

Hemant's problem is that the indicators are not distributed uniformly.
With a uniform distribution, categorization gives a reasonably optimal
separation of cases. One approach would be to drop categorization and
calculate the overall score as the mean of the standardized indicator
scores. Whether this is an option I do not know. I did offer an
"eyeball" set of breaks in a previous email, but apparently this was
not sufficient.

Jim

On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius <dwinsemius at
comcast.net> wrote:>
>> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pikal at
precheza.cz> wrote:
>>
>> Hi
>>
>> You expect us to solve your problem but you ignore advice already
recieved.
>>
>> Your data are unreadable, use dput(yourdata) instead. see ?dput
>>
>>> test<-read.table("clipboard", heade=T)
>> Error in scan(file = file, what = what, sep = sep, quote = quote, dec =
dec,  :
>>  line 115 did not have 6 elements
>
> I didn't have such a problem: (illustrated with a more minimal example)
>
> dat <-  scan( what=list("",1,"",1L,1L,1),
>              text="194849 6.99 8/22/2017 9 5 9.996
> 194978 14.78 8/28/2017 3 15 16.308
> 198614 18.44 7/31/2017 31 1 18.44
> 234569 34.99 8/20/2017 11 8 13.5075
> 252686 7.99 7/31/2017 31 2 7.99
> 291719 21.26 8/25/2017 6 2 15.67
> 291787 46.1 8/31/2017 0 2 32.57
> 292630 24.34 7/31/2017 31 1 24.34
> 295204 21.86 7/18/2017 44 1 21.86
> 295989 8.98 8/20/2017 11 2 14.095
> 298883 14.38 8/24/2017 7 2 11.185
> 308824 10.77 7/31/2017 31 1 10.77")
>
> names(dat) <- c("user_id", "subtotal_amount",
"created_at", "Recency", "Frequency",
"Monetary")
> dat <- data.frame(dat,stringsAsFactors=FALSE)
>
> I suspect read.table would also have worked for me, but I was expecting
difficulties based on Petr's posting.
>
>
> #And ended up with this result (on the original copied data):
>> str(dat)
> 'data.frame':   500 obs. of  6 variables:
>  $ user_id        : chr  "194849" "194978"
"198614" "234569" ...
>  $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
>  $ created_at     : chr  "8/22/2017" "8/28/2017"
"7/31/2017" "8/20/2017" ...
>  $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
>  $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
>  $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...
>
> ...  but the following criticism seems, well, _critical_ (as in essential
for one to address if a reasonable proposal is to be offered.)
>
>
>> What is ?ideal interval? can you define it? Should it be such to
provide eqal number of observations?
>
> That is the crucial question for you to answer, Hemant. Read the ?quartile
help page if your answer is "yes" or even "maybe".
>>
>> Or maybe you could normalise your values and use quartile method.
>
> Well, maybe not so much on that last one, Petr. Normalization should not
affect the classification based on quartiles. It doesn't change the ordering
of variables.
>
> --
> David.
>
>>
>> Cheers
>> Petr
>>
>> From: Hemant Sain [mailto:hemantsain55 at gmail.com]
>> Sent: Friday, October 13, 2017 8:51 AM
>> To: PIKAL Petr <petr.pikal at precheza.cz>
>> Cc: r-help mailing list <r-help at r-project.org>
>> Subject: Re: [R] How to define proper breaks in RFM analysis
>>
>> Hey,
>> i want to define 3 ideal breaks (bin) for each variable one of those
variables is attached in the previous email,
>> i don't want to consider quartile method because quartile is not
working ideally for that data set because data distribution is non normal.
>> so i want you to suggest another method so that i can define 3 breaks
with the ideal interval for Recency, frequency and monetary to calculate RFM
score.
>> i'm again attaching you some of the data set.
>> please look into it and help me with the R code.
>> Thanks
>>
>>
>>
>> Data
>>
>> user_id
>>
>> subtotal_amount
>>
>> created_at
>>
>> Recency
>>
>> Frequency
>>
>> Monetary
>>
>> 194849
>>
>> 6.99
>>
>> 8/22/2017
>>
> snipped
>
>>
>>
>> On 13 October 2017 at 10:35, PIKAL Petr <petr.pikal at
precheza.cz<mailto:petr.pikal at precheza.cz>> wrote:
>> Hi
>>
>> Your statement about attaching data is problematic. We cannot do much
with it. Instead use output from dput(yourdata) to show us what exactly your
data look like.
>>
>> We also do not know how do you want to split your data. It would be
nice if you can show also what should be the bins with respective data. Unless
you provide this information you probably would not get any sensible answer.
>>
>> Cheers
>> Petr
>>
>>
>>> -----Original Message-----
>>> From: R-help [mailto:r-help-bounces at
r-project.org<mailto:r-help-bounces at r-project.org>] On Behalf Of Hemant
Sain
>>> Sent: Thursday, October 12, 2017 10:18 AM
>>> To: r-help mailing list <r-help at
r-project.org<mailto:r-help at r-project.org>>
>>> Subject: [R] How to define proper breaks in RFM analysis
>>>
>>> Hello,
>>> I'm working on RFM analysis and i wanted to define my own
breaks but my
>>> frequency distribution is not normally distributed so when I'm
using quartile its
>>> not giving the optimal results.
>>> so I'm looking for a better approach where i can define breaks
dynamically
>>> because after visualization i can do it easily but i want to apply
this model so
>>> that it can automatically define the breaks according to data set.
>>> I'm attaching sample data for reference.
>>>
>>> Thanks
>>>
>>>                           *Freq*
>>> 5
>>> 15
>>> 1
> snipped
>> .
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently
advanced.'   -Gehm's Corollary to Clarke's Third Law
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Hemant Sain

2017-Oct-23 05:02 UTC

head link

[R] How to define proper breaks in RFM analysis

hello,
I'm confused what you guys are talking about.
i just want to set ideal threshold values for my RFM scores which can be
done using Quantiles but i don't want to use quantiles because my data is
not normally distributed so it will lead to wrong ranges of breaks. to fix
this problem I'm looking for an approach which can define the ideal range
to breaks to categorize RFM scores into 3 segments.
that's all i want.
THanks


On 14 October 2017 at 04:24, Jim Lemon <drjimlemon at gmail.com> wrote:
> Hemant's problem is that the indicators are not distributed uniformly.
> With a uniform distribution, categorization gives a reasonably optimal
> separation of cases. One approach would be to drop categorization and
> calculate the overall score as the mean of the standardized indicator
> scores. Whether this is an option I do not know. I did offer an
> "eyeball" set of breaks in a previous email, but apparently this
was
> not sufficient.
>
> Jim
>
> On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius <dwinsemius at
comcast.net>
> wrote:
> >
> >> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pikal at
precheza.cz> wrote:
> >>
> >> Hi
> >>
> >> You expect us to solve your problem but you ignore advice already
> recieved.
> >>
> >> Your data are unreadable, use dput(yourdata) instead. see ?dput
> >>
> >>> test<-read.table("clipboard", heade=T)
> >> Error in scan(file = file, what = what, sep = sep, quote = quote,
dec > dec,  :
> >>  line 115 did not have 6 elements
> >
> > I didn't have such a problem: (illustrated with a more minimal
example)
> >
> > dat <-  scan( what=list("",1,"",1L,1L,1),
> >              text="194849 6.99 8/22/2017 9 5 9.996
> > 194978 14.78 8/28/2017 3 15 16.308
> > 198614 18.44 7/31/2017 31 1 18.44
> > 234569 34.99 8/20/2017 11 8 13.5075
> > 252686 7.99 7/31/2017 31 2 7.99
> > 291719 21.26 8/25/2017 6 2 15.67
> > 291787 46.1 8/31/2017 0 2 32.57
> > 292630 24.34 7/31/2017 31 1 24.34
> > 295204 21.86 7/18/2017 44 1 21.86
> > 295989 8.98 8/20/2017 11 2 14.095
> > 298883 14.38 8/24/2017 7 2 11.185
> > 308824 10.77 7/31/2017 31 1 10.77")
> >
> > names(dat) <- c("user_id", "subtotal_amount",
"created_at", "Recency",
> "Frequency", "Monetary")
> > dat <- data.frame(dat,stringsAsFactors=FALSE)
> >
> > I suspect read.table would also have worked for me, but I was
expecting
> difficulties based on Petr's posting.
> >
> >
> > #And ended up with this result (on the original copied data):
> >> str(dat)
> > 'data.frame':   500 obs. of  6 variables:
> >  $ user_id        : chr  "194849" "194978"
"198614" "234569" ...
> >  $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
> >  $ created_at     : chr  "8/22/2017" "8/28/2017"
"7/31/2017" "8/20/2017"
> ...
> >  $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
> >  $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
> >  $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...
> >
> > ...  but the following criticism seems, well, _critical_ (as in
> essential for one to address if a reasonable proposal is to be offered.)
> >
> >
> >> What is ?ideal interval? can you define it? Should it be such to
> provide eqal number of observations?
> >
> > That is the crucial question for you to answer, Hemant. Read the
> ?quartile help page if your answer is "yes" or even
"maybe".
> >>
> >> Or maybe you could normalise your values and use quartile method.
> >
> > Well, maybe not so much on that last one, Petr. Normalization should
not
> affect the classification based on quartiles. It doesn't change the
> ordering of variables.
> >
> > --
> > David.
> >
> >>
> >> Cheers
> >> Petr
> >>
> >> From: Hemant Sain [mailto:hemantsain55 at gmail.com]
> >> Sent: Friday, October 13, 2017 8:51 AM
> >> To: PIKAL Petr <petr.pikal at precheza.cz>
> >> Cc: r-help mailing list <r-help at r-project.org>
> >> Subject: Re: [R] How to define proper breaks in RFM analysis
> >>
> >> Hey,
> >> i want to define 3 ideal breaks (bin) for each variable one of
those
> variables is attached in the previous email,
> >> i don't want to consider quartile method because quartile is
not
> working ideally for that data set because data distribution is non normal.
> >> so i want you to suggest another method so that i can define 3
breaks
> with the ideal interval for Recency, frequency and monetary to calculate
> RFM score.
> >> i'm again attaching you some of the data set.
> >> please look into it and help me with the R code.
> >> Thanks
> >>
> >>
> >>
> >> Data
> >>
> >> user_id
> >>
> >> subtotal_amount
> >>
> >> created_at
> >>
> >> Recency
> >>
> >> Frequency
> >>
> >> Monetary
> >>
> >> 194849
> >>
> >> 6.99
> >>
> >> 8/22/2017
> >>
> > snipped
> >
> >>
> >>
> >> On 13 October 2017 at 10:35, PIKAL Petr <petr.pikal at
precheza.cz<mailto:
> petr.pikal at precheza.cz>> wrote:
> >> Hi
> >>
> >> Your statement about attaching data is problematic. We cannot do
much
> with it. Instead use output from dput(yourdata) to show us what exactly
> your data look like.
> >>
> >> We also do not know how do you want to split your data. It would
be
> nice if you can show also what should be the bins with respective data.
> Unless you provide this information you probably would not get any sensible
> answer.
> >>
> >> Cheers
> >> Petr
> >>
> >>
> >>> -----Original Message-----
> >>> From: R-help [mailto:r-help-bounces at
r-project.org<mailto:r-help-
> bounces at r-project.org>] On Behalf Of Hemant Sain
> >>> Sent: Thursday, October 12, 2017 10:18 AM
> >>> To: r-help mailing list <r-help at
r-project.org<mailto:r
> -help at r-project.org>>
> >>> Subject: [R] How to define proper breaks in RFM analysis
> >>>
> >>> Hello,
> >>> I'm working on RFM analysis and i wanted to define my own
breaks but my
> >>> frequency distribution is not normally distributed so when
I'm using
> quartile its
> >>> not giving the optimal results.
> >>> so I'm looking for a better approach where i can define
breaks
> dynamically
> >>> because after visualization i can do it easily but i want to
apply
> this model so
> >>> that it can automatically define the breaks according to data
set.
> >>> I'm attaching sample data for reference.
> >>>
> >>> Thanks
> >>>
> >>>                           *Freq*
> >>> 5
> >>> 15
> >>> 1
> > snipped
> >> .
> >>
> >>       [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >
> > David Winsemius
> > Alameda, CA, USA
> >
> > 'Any technology distinguishable from magic is insufficiently
advanced.'
>  -Gehm's Corollary to Clarke's Third Law
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>


-- 
hemantsain.com

	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more maybe matching threads

R help - Oct 2017 - How to define proper breaks in RFM analysis

[R] How to define proper breaks in RFM analysis

[R] How to define proper breaks in RFM analysis

[R] How to define proper breaks in RFM analysis

Reasonably Related Threads