Dhiman Biswas
2013-Mar-09 12:37 UTC
[R] grouping followed by finding frequent patterns in R
I have a data in the following form : CIN TRN_TYP 9079954 1 9079954 2 9079954 3 9079954 4 9079954 5 9079954 4 9079954 5 9079954 6 9079954 7 9079954 8 9079954 9 9079954 9 . . . . . . there are 100 types of CIN (9079954,12441087,15246633,...) and respective TRN_TYP first of all, I want this data to be grouped into basket format: 9079954 1, 2, 3, 4, 5, .... 12441087 19, 14, 21, 3, 7, ... . . . and then apply eclat from arules package to find frequent patterns. 1) I ran the following code: file<-read.csv("D:/R/Practice/Data_Input_NUM.csv") file <- file[!duplicated(file),] eclat(split(file$TRN_TYP,file$CIN)) but it gave me the following error: Error in asMethod(object) : can not coerce list with transactions with duplicated items 2) I ran this code: file<-read.csv("D:/R/Practice/Data_Input_NUM.csv") file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other columns as well, so I selecting only CIN and TRN_TYP file_new <- file_new[!duplicated(file_new),] eclat(split(file_new$TRN_TYP,file_new$CIN)) but again: Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) : internal error in trio library PLEASE HELP [[alternative HTML version deleted]]
I **suggest** that you explain what you wish to accomplish using a reproducible example rather than telling us what packages you think you should use. I believe you are making things too complicated; e.g. what do you mean by "frequent patterns"? Moreover, "basket format" is rather unclear -- and may well be unnecessary. But using lists, it could be simply accomplished by ?split ## as in the_list <- with(yourdata, split(TYP, CIN.TRN)) or possibly the_list <- with(yourdata, tapply(TYP,CIN.TRN, FUN = table)) Of course, these may be irrelevant and useless, but without knowing your purpose ...? -- Bert On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas <crazydhimu at gmail.com> wrote:> I have a data in the following form : > CIN TRN_TYP > 9079954 1 > 9079954 2 > 9079954 3 > 9079954 4 > 9079954 5 > 9079954 4 > 9079954 5 > 9079954 6 > 9079954 7 > 9079954 8 > 9079954 9 > 9079954 9 > . . > . . > . . > there are 100 types of CIN (9079954,12441087,15246633,...) and respective > TRN_TYP > > first of all, I want this data to be grouped into basket format: > 9079954 1, 2, 3, 4, 5, .... > 12441087 19, 14, 21, 3, 7, ... > . > . > . > and then apply eclat from arules package to find frequent patterns. > > 1) I ran the following code: > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv") > file <- file[!duplicated(file),] > eclat(split(file$TRN_TYP,file$CIN)) > > but it gave me the following error: > Error in asMethod(object) : can not coerce list with transactions with > duplicated items > > 2) I ran this code: > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv") > file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other > columns as well, so I selecting only CIN and TRN_TYP > file_new <- file_new[!duplicated(file_new),] > eclat(split(file_new$TRN_TYP,file_new$CIN)) > > but again: > Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) : > internal error in trio library > > PLEASE HELP > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
1.Please cc to the list, as I have here, unless your comments are off topic. 2. Use dput() (?dput) to include **small** amounts of data in your message, as attachments are generally stripped from r-help. 3. I have no experience with itemsets or the arules package, but a quick glance at the docs there said that your data argument must be in a specific form coercible into an S4 "transactions" class. I suspect that neither your initial data frame nor the list deriving from split is, but maybe someone familiar with the package can tell you for sure. That's why you need to cc to the list. -- Bert On Sun, Mar 10, 2013 at 7:04 AM, Dhiman Biswas <crazydhimu at gmail.com> wrote:> Dear Bert, > > My intention is to mine frequent itemsets of TRN_TYP for individual CIN out > of that data. > But the problem is using eclat after splitting gives the following error: > > Error in eclat(list) : internal error in trio library > > PS: I have attached my dataset. > > > On Sat, Mar 9, 2013 at 8:27 PM, Bert Gunter <gunter.berton at gene.com> wrote: >> >> I **suggest** that you explain what you wish to accomplish using a >> reproducible example rather than telling us what packages you think >> you should use. I believe you are making things too complicated; e.g. >> what do you mean by "frequent patterns"? Moreover, "basket format" is >> rather unclear -- and may well be unnecessary. But using lists, it >> could be simply accomplished by >> >> ?split ## as in >> the_list <- with(yourdata, split(TYP, CIN.TRN)) >> >> or possibly >> >> the_list <- with(yourdata, tapply(TYP,CIN.TRN, FUN = table)) >> >> Of course, these may be irrelevant and useless, but without knowing >> your purpose ...? >> >> -- Bert >> >> On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas <crazydhimu at gmail.com> >> wrote: >> > I have a data in the following form : >> > CIN TRN_TYP >> > 9079954 1 >> > 9079954 2 >> > 9079954 3 >> > 9079954 4 >> > 9079954 5 >> > 9079954 4 >> > 9079954 5 >> > 9079954 6 >> > 9079954 7 >> > 9079954 8 >> > 9079954 9 >> > 9079954 9 >> > . . >> > . . >> > . . >> > there are 100 types of CIN (9079954,12441087,15246633,...) and >> > respective >> > TRN_TYP >> > >> > first of all, I want this data to be grouped into basket format: >> > 9079954 1, 2, 3, 4, 5, .... >> > 12441087 19, 14, 21, 3, 7, ... >> > . >> > . >> > . >> > and then apply eclat from arules package to find frequent patterns. >> > >> > 1) I ran the following code: >> > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv") >> > file <- file[!duplicated(file),] >> > eclat(split(file$TRN_TYP,file$CIN)) >> > >> > but it gave me the following error: >> > Error in asMethod(object) : can not coerce list with transactions with >> > duplicated items >> > >> > 2) I ran this code: >> > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv") >> > file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other >> > columns as well, so I selecting only CIN and TRN_TYP >> > file_new <- file_new[!duplicated(file_new),] >> > eclat(split(file_new$TRN_TYP,file_new$CIN)) >> > >> > but again: >> > Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) : >> > internal error in trio library >> > >> > PLEASE HELP >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> >> Bert Gunter >> Genentech Nonclinical Biostatistics >> >> Internal Contact Info: >> Phone: 467-7374 >> Website: >> >> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm > >-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm