thr3ads.net - R help - [R] Most often pairs of chars across grouping variable [Jul 2008]

If this information is useful, please help other people find it:
Share via:

svga at arcor.de

2008-Jul-29 14:51 UTC

[R] Most often pairs of chars across grouping variable

Hi list,

is there a package or function to compute the frequencies of pairs of chars in a
variable across a grouping variable? Eg:


d <- data.frame(ID=gl(2,3),
F=c("A","B","C","A","C","D"))> d  ID F
1  1 A
2  1 B
3  1 C
4  2 A
5  2 C
6  2 D


Now I want to summarize the frequencies of all pairs A-B, A-C, A-D, B-C, B-D,
C-D across ID:

   A B C D
A  - 1 2 1
B  - - 1 0
C  - - - 1


here, the combination A-C is most frequent. The real problem behind that is that
'F' codes diagnoses and I search for the most often pairs of diagnoses.

Thanks, Sven

Marc Schwartz

2008-Jul-29 15:15 UTC

head link

[R] Most often pairs of chars across grouping variable

on 07/29/2008 09:51 AM svga at arcor.de wrote:> Hi list,
> 
> is there a package or function to compute the frequencies of pairs of
> chars in a variable across a grouping variable? Eg:
> 
> 
> d <- data.frame(ID=gl(2,3),
F=c("A","B","C","A","C","D"))
>> d
> ID F 1  1 A 2  1 B 3  1 C 4  2 A 5  2 C 6  2 D
> 
> 
> Now I want to summarize the frequencies of all pairs A-B, A-C, A-D,
> B-C, B-D, C-D across ID:
> 
> A B C D A  - 1 2 1 B  - - 1 0 C  - - - 1
> 
> 
> here, the combination A-C is most frequent. The real problem behind
> that is that 'F' codes diagnoses and I search for the most often
> pairs of diagnoses.
> 
> Thanks, Sven
I suspect that there might be something over in Bioconductor, but here 
is one approach:

 > table(data.frame(t(do.call(cbind,
                      tapply(d$F, d$ID,
                             function(x) combn(as.character(x), 2))))))
    X2
X1  B C D
   A 1 2 1
   B 0 1 0
   C 0 0 1


See ?combn to create the initial pairs from the data. This is done on a 
per ID basis using tapply. The result is transposed into a data frame 
and then table() is used to create the cross tabulation of the results.

HTH,

Marc Schwartz

svga at arcor.de

2008-Jul-30 14:28 UTC

head link

[R] Most often pairs of chars across grouping variable

Hi Marc,

many thanks, that is exactly  what I was looking for.

Best, Sven


----- Original Nachricht ----
Von:     Marc Schwartz <marc_schwartz at comcast.net>
An:      svga at arcor.de
Datum:   29.07.2008 17:15
Betreff: Re: [R] Most often pairs of chars across grouping variable
> on 07/29/2008 09:51 AM svga at arcor.de wrote:
> > Hi list,
> > 
> > is there a package or function to compute the frequencies of pairs of
> > chars in a variable across a grouping variable? Eg:
> > 
> > 
> > d <- data.frame(ID=gl(2,3),
F=c("A","B","C","A","C","D"))
> >> d
> > ID F 1  1 A 2  1 B 3  1 C 4  2 A 5  2 C 6  2 D
> > 
> > 
> > Now I want to summarize the frequencies of all pairs A-B, A-C, A-D,
> > B-C, B-D, C-D across ID:
> > 
> > A B C D A  - 1 2 1 B  - - 1 0 C  - - - 1
> > 
> > 
> > here, the combination A-C is most frequent. The real problem behind
> > that is that 'F' codes diagnoses and I search for the most
often
> > pairs of diagnoses.
> > 
> > Thanks, Sven
> 
> I suspect that there might be something over in Bioconductor, but here 
> is one approach:
> 
>  > table(data.frame(t(do.call(cbind,
>                       tapply(d$F, d$ID,
>                              function(x) combn(as.character(x), 2))))))
>     X2
> X1  B C D
>    A 1 2 1
>    B 0 1 0
>    C 0 0 1
> 
> 
> See ?combn to create the initial pairs from the data. This is done on a 
> per ID basis using tapply. The result is transposed into a data frame 
> and then table() is used to create the cross tabulation of the results.
> 
> HTH,
> 
> Marc Schwartz
> 
>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Jul 2008 - Most often pairs of chars across grouping variable

[R] Most often pairs of chars across grouping variable

[R] Most often pairs of chars across grouping variable

[R] Most often pairs of chars across grouping variable

Seemingly Similar Threads