svga at arcor.de
2008-Jul-29 14:51 UTC
[R] Most often pairs of chars across grouping variable
Hi list, is there a package or function to compute the frequencies of pairs of chars in a variable across a grouping variable? Eg: d <- data.frame(ID=gl(2,3), F=c("A","B","C","A","C","D"))> dID F 1 1 A 2 1 B 3 1 C 4 2 A 5 2 C 6 2 D Now I want to summarize the frequencies of all pairs A-B, A-C, A-D, B-C, B-D, C-D across ID: A B C D A - 1 2 1 B - - 1 0 C - - - 1 here, the combination A-C is most frequent. The real problem behind that is that 'F' codes diagnoses and I search for the most often pairs of diagnoses. Thanks, Sven
Marc Schwartz
2008-Jul-29 15:15 UTC
[R] Most often pairs of chars across grouping variable
on 07/29/2008 09:51 AM svga at arcor.de wrote:> Hi list, > > is there a package or function to compute the frequencies of pairs of > chars in a variable across a grouping variable? Eg: > > > d <- data.frame(ID=gl(2,3), F=c("A","B","C","A","C","D")) >> d > ID F 1 1 A 2 1 B 3 1 C 4 2 A 5 2 C 6 2 D > > > Now I want to summarize the frequencies of all pairs A-B, A-C, A-D, > B-C, B-D, C-D across ID: > > A B C D A - 1 2 1 B - - 1 0 C - - - 1 > > > here, the combination A-C is most frequent. The real problem behind > that is that 'F' codes diagnoses and I search for the most often > pairs of diagnoses. > > Thanks, SvenI suspect that there might be something over in Bioconductor, but here is one approach: > table(data.frame(t(do.call(cbind, tapply(d$F, d$ID, function(x) combn(as.character(x), 2)))))) X2 X1 B C D A 1 2 1 B 0 1 0 C 0 0 1 See ?combn to create the initial pairs from the data. This is done on a per ID basis using tapply. The result is transposed into a data frame and then table() is used to create the cross tabulation of the results. HTH, Marc Schwartz
svga at arcor.de
2008-Jul-30 14:28 UTC
[R] Most often pairs of chars across grouping variable
Hi Marc, many thanks, that is exactly what I was looking for. Best, Sven ----- Original Nachricht ---- Von: Marc Schwartz <marc_schwartz at comcast.net> An: svga at arcor.de Datum: 29.07.2008 17:15 Betreff: Re: [R] Most often pairs of chars across grouping variable> on 07/29/2008 09:51 AM svga at arcor.de wrote: > > Hi list, > > > > is there a package or function to compute the frequencies of pairs of > > chars in a variable across a grouping variable? Eg: > > > > > > d <- data.frame(ID=gl(2,3), F=c("A","B","C","A","C","D")) > >> d > > ID F 1 1 A 2 1 B 3 1 C 4 2 A 5 2 C 6 2 D > > > > > > Now I want to summarize the frequencies of all pairs A-B, A-C, A-D, > > B-C, B-D, C-D across ID: > > > > A B C D A - 1 2 1 B - - 1 0 C - - - 1 > > > > > > here, the combination A-C is most frequent. The real problem behind > > that is that 'F' codes diagnoses and I search for the most often > > pairs of diagnoses. > > > > Thanks, Sven > > I suspect that there might be something over in Bioconductor, but here > is one approach: > > > table(data.frame(t(do.call(cbind, > tapply(d$F, d$ID, > function(x) combn(as.character(x), 2)))))) > X2 > X1 B C D > A 1 2 1 > B 0 1 0 > C 0 0 1 > > > See ?combn to create the initial pairs from the data. This is done on a > per ID basis using tapply. The result is transposed into a data frame > and then table() is used to create the cross tabulation of the results. > > HTH, > > Marc Schwartz > >