Jim Lemon
2017-Oct-09 20:57 UTC
[R] Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)
I seriously doubt that you are running the code I sent. What you have probably done is to run your data, which has a different date format, without changing the breaks or the date format arguments. As you haven't provided any example that shows what you are doing, I can't guess what the problem is. Jim On Mon, Oct 9, 2017 at 9:40 PM, Hemant Sain <hemantsain55 at gmail.com> wrote:> I'm getting all the rows as NA in Cscore and almost most of the observation > in R and F and M are also NA. > what can be the reason for this. also suggest me the appropriate solution. > > On 9 October 2017 at 15:51, Jim Lemon <drjimlemon at gmail.com> wrote: >> >> Hi Hemant, >> Here is an example that might answer your questions. Please don't run >> previous code as it might not work. >> >> I define the break values as arguments to the function >> (rbreaks,fbreaks,mbreaks) If you want the breaks to work, make sure that >> they cover the range of the input values, otherwise you get NAs. >> >> # expects a three (or more) column data frame where >> # column 1 is customer ID, column 2 is amount of purchase >> # and column 3 is date of purchase >> qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,date.format="%Y-%m-%d", >> weights=c(1,1,1),finish=NA) { >> >> # if no finish date is specified, use current date >> if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y") >> x$rscore<-as.numeric(finish-as.Date(x[,3],date.format)) >> x$rscore<-as.numeric(cut(x$rscore,breaks=rbreaks,labels=FALSE)) >> custIDs<-unique(x[,1]) >> ncust<-length(custIDs) >> rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust), >> fscore=rep(0,ncust),mscore=rep(0,ncust)) >> rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE) >> rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE) >> rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE) >> rfmout$cscore<-(weights[1]*rfmout$rscore+ >> weights[2]*rfmout$fscore+ >> weights[3]*rfmout$mscore)/sum(weights) >> return(rfmout[order(rfmout$cscore),]) >> } >> >> set.seed(12345) >> x2<-data.frame(ID=sample(1:50,250,TRUE), >> purchase=round(runif(250,5,100),2), >> date=paste(rep(2016,250),sample(1:12,250,TRUE), >> sample(1:28,250,TRUE),sep="-")) >> >> # example 1 >> qdrfm(x2) >> >> # example 2 >> qdrfm(x2,rbreaks=c(0,200,400),fbreaks=c(0,5,10),mbreaks=c(0,350,700), >> finish=as.Date("2017-01-01")) >> >> Jim >> > > > > -- > hemantsain.com
Hemant Sain
2017-Oct-10 05:19 UTC
[R] Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)
Hello Jim, i have converted all my variable data type according to your attached example including date, and my dataset looks like this. ID purchase date 1234 10.2 2017-02-18 3453 18.9 2017-03-22 7689 8 2017-03-24 but when I'm passing the data into the function it is giving me same values for entire observations i. r=2, f=2, m=2 and which part of your code is responsible to calculate recency and frequency score i mean how it will determine how many times a user made a purchase in last 30 days so that we can put that user into our own defined category. one more thing it would be great if you can explain lil bit about finish date. because i'm not able to understand what do you meant by finish date. Thanks On 10 October 2017 at 02:27, Jim Lemon <drjimlemon at gmail.com> wrote:> I seriously doubt that you are running the code I sent. What you have > probably done is to run your data, which has a different date format, > without changing the breaks or the date format arguments. As you > haven't provided any example that shows what you are doing, I can't > guess what the problem is. > > Jim > > > On Mon, Oct 9, 2017 at 9:40 PM, Hemant Sain <hemantsain55 at gmail.com> > wrote: > > I'm getting all the rows as NA in Cscore and almost most of the > observation > > in R and F and M are also NA. > > what can be the reason for this. also suggest me the appropriate > solution. > > > > On 9 October 2017 at 15:51, Jim Lemon <drjimlemon at gmail.com> wrote: > >> > >> Hi Hemant, > >> Here is an example that might answer your questions. Please don't run > >> previous code as it might not work. > >> > >> I define the break values as arguments to the function > >> (rbreaks,fbreaks,mbreaks) If you want the breaks to work, make sure that > >> they cover the range of the input values, otherwise you get NAs. > >> > >> # expects a three (or more) column data frame where > >> # column 1 is customer ID, column 2 is amount of purchase > >> # and column 3 is date of purchase > >> qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3,date.format="%Y-%m-%d", > >> weights=c(1,1,1),finish=NA) { > >> > >> # if no finish date is specified, use current date > >> if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y") > >> x$rscore<-as.numeric(finish-as.Date(x[,3],date.format)) > >> x$rscore<-as.numeric(cut(x$rscore,breaks=rbreaks,labels=FALSE)) > >> custIDs<-unique(x[,1]) > >> ncust<-length(custIDs) > >> rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust), > >> fscore=rep(0,ncust),mscore=rep(0,ncust)) > >> rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE) > >> rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE) > >> rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE) > >> rfmout$cscore<-(weights[1]*rfmout$rscore+ > >> weights[2]*rfmout$fscore+ > >> weights[3]*rfmout$mscore)/sum(weights) > >> return(rfmout[order(rfmout$cscore),]) > >> } > >> > >> set.seed(12345) > >> x2<-data.frame(ID=sample(1:50,250,TRUE), > >> purchase=round(runif(250,5,100),2), > >> date=paste(rep(2016,250),sample(1:12,250,TRUE), > >> sample(1:28,250,TRUE),sep="-")) > >> > >> # example 1 > >> qdrfm(x2) > >> > >> # example 2 > >> qdrfm(x2,rbreaks=c(0,200,400),fbreaks=c(0,5,10),mbreaks=c(0,350,700), > >> finish=as.Date("2017-01-01")) > >> > >> Jim > >> > > > > > > > > -- > > hemantsain.com >-- hemantsain.com [[alternative HTML version deleted]]
Jim Lemon
2017-Oct-10 22:00 UTC
[R] Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)
Hi Hemant, see inline below. On Tue, Oct 10, 2017 at 4:19 PM, Hemant Sain <hemantsain55 at gmail.com> wrote:> Hello Jim, > i have converted all my variable data type according to your attached > example including date, and my dataset looks like this. > > > ID purchase date > 1234 10.2 2017-02-18 > 3453 18.9 2017-03-22 > 7689 8 2017-03-24 >As I don't have your data set, I can't tell you why you are getting the same values. First, I'll create a data set that looks something like your example, except with 50 customers and 250 transactions, all in 2017: set.seed(12345) x3<-data.frame(ID=sample(1234:1283,250,TRUE), purchase=round(runif(250,5,100),2), date=paste(rep(2017,250),sample(1:9,250,TRUE), sample(1:28,250,TRUE),sep="-")) Look at it carefully. Is there anything that you think is wrong?> > > but when I'm passing the data into the function it is giving me same values > for entire observations i. r=2, f=2, m=2 > > and which part of your code is responsible to calculate recency and > frequency score i mean how it will determine how many times a user made a > purchase in last 30 days so that we can put that user into our own defined > category. >Here is the function commented for easier understanding. Recency is calculated as the most recent purchase for each customer from the "finish" date, which defaults to the current date. If you are examining historical data, you may want to set a different "finish" date. Frequency is simply the number of purchases recorded for each customer. Monetary is the sum of the purchase amounts for each customer The default breaks for each score are those calculated by the "cut" function. If you want specific breaks, they _must_ cover the range of the values or cut will generate NAs. I have added a printout of the ranges of the raw recency, frequency and monetary scores so that you can enter your own breaks. qdrfm<-function(x,rbreaks=3,fbreaks=3,mbreaks=3, date.format="%Y-%m-%d",weights=c(1,1,1),finish=NA) { # if no finish date is specified, use current date if(is.na(finish)) finish<-as.Date(date(), "%a %b %d %H:%M:%S %Y") x$rscore<-as.numeric(finish-as.Date(x[,3],date.format)) cat("Range of purchase recency",range(x$rscore),"\n") x$rscore<-as.numeric(cut(x$rscore,breaks=rbreaks,labels=FALSE)) cat("Range of purchase freqency",range(table(x[,1])),"\n") cat("Range of purchase amount",range(by(x[,2],x[,1],sum)),"\n") custIDs<-unique(x[,1]) ncust<-length(custIDs) # initialize a data frame to hold the output rfmout<-data.frame(custID=custIDs,rscore=rep(0,ncust), fscore=rep(0,ncust),mscore=rep(0,ncust)) # categorize the minimum number of days # since last purchase for each customer rfmout$rscore<-cut(by(x$rscore,x[,1],min),breaks=rbreaks,labels=FALSE) # categorize the number of purchases # recorded for each customer rfmout$fscore<-cut(table(x[,1]),breaks=fbreaks,labels=FALSE) # categorize the amount purchased # by each customer rfmout$mscore<-cut(by(x[,2],x[,1],sum),breaks=mbreaks,labels=FALSE) # calculate the RFM score from the # optionally weighted average of the above rfmout$cscore<-round((weights[1]*rfmout$rscore+ weights[2]*rfmout$fscore+ weights[3]*rfmout$mscore)/sum(weights),2) return(rfmout[order(rfmout$cscore),]) } # run the dataset with default breaks qdrfm(x3) # now specify breaks with respect to the printout of the raw scores qdrfm(x3,rbreaks=c(0,150,300),fbreaks=c(0,5,11),mbreaks=c(0,300,600)) # now give the total amount purchased twice the weight qdrfm(x3,rbreaks=c(0,150,300),fbreaks=c(0,5,11), mbreaks=c(0,300,600),weights=c(1,1,2)) I hope that this will explain the function better. Jim
Possibly Parallel Threads
- Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)
- Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)
- Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)
- RFM analysis
- Help RFM analysis in R (i want a code where i can define my own breaks instead of system defined breaks used in auto_RFM package)