thr3ads.net - R help - [R] grouping followed by finding frequent patterns in R [Mar 2013]

If this information is useful, please help other people find it:
Share via:

Dhiman Biswas

2013-Mar-09 12:37 UTC

[R] grouping followed by finding frequent patterns in R

I have a data in the following form :
CIN TRN_TYP
9079954    1
9079954    2
9079954    3
9079954    4
9079954    5
9079954    4
9079954    5
9079954    6
9079954    7
9079954    8
9079954    9
9079954    9
.                    .
.                    .
.                    .
there are 100 types of CIN (9079954,12441087,15246633,...) and respective
TRN_TYP

first of all, I want this data to be grouped into basket format:
9079954   1, 2, 3, 4, 5, ....
12441087  19, 14, 21, 3, 7, ...
.
.
.
and then apply eclat from arules package to find frequent patterns.

1) I ran the following code:
file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
file <- file[!duplicated(file),]
eclat(split(file$TRN_TYP,file$CIN))

but it gave me the following error:
Error in asMethod(object) : can not coerce list with transactions with
duplicated items

2) I ran this code:
file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other
columns as well, so I selecting only CIN and TRN_TYP
file_new <- file_new[!duplicated(file_new),]
eclat(split(file_new$TRN_TYP,file_new$CIN))

but again:
Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
  internal error in trio library

PLEASE HELP

	[[alternative HTML version deleted]]

Bert Gunter

2013-Mar-09 14:57 UTC

head link

[R] grouping followed by finding frequent patterns in R

I **suggest** that you explain what you wish to accomplish using a
reproducible example rather than telling us what packages you think
you should use. I believe you are making things too complicated; e.g.
what do you mean by "frequent patterns"?  Moreover, "basket
format" is
rather unclear -- and may well be unnecessary. But using lists, it
could be simply accomplished by

?split  ## as in
the_list <- with(yourdata, split(TYP,  CIN.TRN))

or possibly

the_list <- with(yourdata, tapply(TYP,CIN.TRN, FUN = table))

Of course, these may be irrelevant and useless, but without knowing
your purpose ...?

-- Bert

On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas <crazydhimu at gmail.com>
wrote:> I have a data in the following form :
> CIN TRN_TYP
> 9079954    1
> 9079954    2
> 9079954    3
> 9079954    4
> 9079954    5
> 9079954    4
> 9079954    5
> 9079954    6
> 9079954    7
> 9079954    8
> 9079954    9
> 9079954    9
> .                    .
> .                    .
> .                    .
> there are 100 types of CIN (9079954,12441087,15246633,...) and respective
> TRN_TYP
>
> first of all, I want this data to be grouped into basket format:
> 9079954   1, 2, 3, 4, 5, ....
> 12441087  19, 14, 21, 3, 7, ...
> .
> .
> .
> and then apply eclat from arules package to find frequent patterns.
>
> 1) I ran the following code:
> file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
> file <- file[!duplicated(file),]
> eclat(split(file$TRN_TYP,file$CIN))
>
> but it gave me the following error:
> Error in asMethod(object) : can not coerce list with transactions with
> duplicated items
>
> 2) I ran this code:
> file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
> file_new<-file[,c(3,6)] # because my file Data_Input_NUM has many other
> columns as well, so I selecting only CIN and TRN_TYP
> file_new <- file_new[!duplicated(file_new),]
> eclat(split(file_new$TRN_TYP,file_new$CIN))
>
> but again:
> Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
>   internal error in trio library
>
> PLEASE HELP
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Bert Gunter

2013-Mar-10 14:55 UTC

head link

[R] grouping followed by finding frequent patterns in R

1.Please cc to the list, as I have here, unless your comments are off topic.

2. Use dput() (?dput) to include **small** amounts of data in your
message, as attachments are generally stripped from r-help.

3. I have no experience with itemsets or the arules package, but a
quick glance at the docs there said that your data argument must be in
a specific form coercible into an S4 "transactions" class. I suspect
that neither your initial data frame nor the list deriving from split
is, but maybe someone familiar with the package can tell you for sure.
That's why you need to cc to the list.

-- Bert


On Sun, Mar 10, 2013 at 7:04 AM, Dhiman Biswas <crazydhimu at gmail.com>
wrote:> Dear Bert,
>
> My intention is to mine frequent itemsets of TRN_TYP for individual CIN out
> of that data.
> But the problem is using eclat after splitting gives the following error:
>
> Error in eclat(list) : internal error in trio library
>
> PS: I have attached my dataset.
>
>
> On Sat, Mar 9, 2013 at 8:27 PM, Bert Gunter <gunter.berton at
gene.com> wrote:
>>
>> I **suggest** that you explain what you wish to accomplish using a
>> reproducible example rather than telling us what packages you think
>> you should use. I believe you are making things too complicated; e.g.
>> what do you mean by "frequent patterns"?  Moreover,
"basket format" is
>> rather unclear -- and may well be unnecessary. But using lists, it
>> could be simply accomplished by
>>
>> ?split  ## as in
>> the_list <- with(yourdata, split(TYP,  CIN.TRN))
>>
>> or possibly
>>
>> the_list <- with(yourdata, tapply(TYP,CIN.TRN, FUN = table))
>>
>> Of course, these may be irrelevant and useless, but without knowing
>> your purpose ...?
>>
>> -- Bert
>>
>> On Sat, Mar 9, 2013 at 4:37 AM, Dhiman Biswas <crazydhimu at
gmail.com>
>> wrote:
>> > I have a data in the following form :
>> > CIN TRN_TYP
>> > 9079954    1
>> > 9079954    2
>> > 9079954    3
>> > 9079954    4
>> > 9079954    5
>> > 9079954    4
>> > 9079954    5
>> > 9079954    6
>> > 9079954    7
>> > 9079954    8
>> > 9079954    9
>> > 9079954    9
>> > .                    .
>> > .                    .
>> > .                    .
>> > there are 100 types of CIN (9079954,12441087,15246633,...) and
>> > respective
>> > TRN_TYP
>> >
>> > first of all, I want this data to be grouped into basket format:
>> > 9079954   1, 2, 3, 4, 5, ....
>> > 12441087  19, 14, 21, 3, 7, ...
>> > .
>> > .
>> > .
>> > and then apply eclat from arules package to find frequent
patterns.
>> >
>> > 1) I ran the following code:
>> > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
>> > file <- file[!duplicated(file),]
>> > eclat(split(file$TRN_TYP,file$CIN))
>> >
>> > but it gave me the following error:
>> > Error in asMethod(object) : can not coerce list with transactions
with
>> > duplicated items
>> >
>> > 2) I ran this code:
>> > file<-read.csv("D:/R/Practice/Data_Input_NUM.csv")
>> > file_new<-file[,c(3,6)] # because my file Data_Input_NUM has
many other
>> > columns as well, so I selecting only CIN and TRN_TYP
>> > file_new <- file_new[!duplicated(file_new),]
>> > eclat(split(file_new$TRN_TYP,file_new$CIN))
>> >
>> > but again:
>> > Error in eclat(split(file_new$TRN_TYP, file_new$CIN)) :
>> >   internal error in trio library
>> >
>> > PLEASE HELP
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>> --
>>
>> Bert Gunter
>> Genentech Nonclinical Biostatistics
>>
>> Internal Contact Info:
>> Phone: 467-7374
>> Website:
>>
>>
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
>
>


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Seemingly Similar Threads

Search for more reasonably related threads

R help - Mar 2013 - grouping followed by finding frequent patterns in R

[R] grouping followed by finding frequent patterns in R

[R] grouping followed by finding frequent patterns in R

[R] grouping followed by finding frequent patterns in R

Seemingly Similar Threads