Jan.Sunde at biokapital.no
2010-May-19 09:56 UTC
[R] Joining two (or more) frequency tables
Hi, I am still new to R and have, in spite of searching all over, found no "understandable" documentation for how to join two (or more) frequency tables. Why would I want to do that and what do I mean by "joining" ? Let me try to explain: I have a diagnosis form that allows registration of up to three diagnosis codes on animals with birth defects, like this: father.id diagnosis diagnosis2 diagnosis3 a 100 200 300 b 200 100 340 .... and so on. As you can see the same diagnosis can show up in either field (1, 2 or 3). I am interested in how many times each diagnosis shows up for offspring from the same father (looking at inheritance) The best I have come up with so far is table(father.id, diagnosis) table(father.id, diagnosis2) table(father.id, diagnosis3) which gives me the frequency counts for each combination of father.id and diagnosis code and then I manually join these tables together by cutting and pasting (in Excel!!). Oh the horror! The resulting table looks like this though and gives me what I want: father.id 100 200 300 340 a 1 1 1 0 b 1 1 0 1 ..... This is incredibly cumbersome and therefore I want to find a way of doing this completely in R script so that I have the code ready when I need it again. So far I have tried with several varieties of merge() and xtabs() but have given up. I can not afford to spend more time on this and as a last resort I'm hoping anyone here could pass me a tip. I use R for modelling but I am so far finding it hard to wrap my head around it when it comes to the pre-analysis data manipulation like this as I am not proficient enough (i.e. do not quite understand) in the language and data structure intricacies. Regards, Jan [[alternative HTML version deleted]]
use the 'reshape package:> require(reshape)Loading required package: reshape Loading required package: plyr> xfather.id diagnosis diagnosis2 diagnosis3 1 a 100 200 300 2 b 200 100 340> melt(x)Using father.id as id variables father.id variable value 1 a diagnosis 100 2 b diagnosis 200 3 a diagnosis2 200 4 b diagnosis2 100 5 a diagnosis3 300 6 b diagnosis3 340> z <- melt(x)Using father.id as id variables> table(z$father.id, z$value)100 200 300 340 a 1 1 1 0 b 1 1 0 1>On Wed, May 19, 2010 at 5:56 AM, <Jan.Sunde at biokapital.no> wrote:> Hi, > I am still new to R and have, in spite of searching all over, found no > "understandable" documentation for how to join two (or more) frequency > tables. > Why would I want to do that and what do I mean by "joining" ? Let me try > to explain: > > I have a diagnosis form that allows registration of up to three diagnosis > codes on animals with birth defects, like this: > > father.id diagnosis diagnosis2 diagnosis3 > a ? ? ? ? ? ? 100 ? ? ? ? 200 ? ? ? ? ? ?300 > b ? ? ? ? ? ? ?200 ? ? ? ?100 ? ? ? ? ? ?340 > .... > > and so on. > > As you can see the same diagnosis can show up in either field (1, 2 or 3). > > I am interested in how many times each diagnosis shows up for offspring > from the same father (looking at inheritance) > > The best I have come up with so far is > > table(father.id, diagnosis) > table(father.id, diagnosis2) > table(father.id, diagnosis3) > > which gives me the frequency counts for each combination of father.id and > diagnosis code > > and then I manually join these tables together by cutting and pasting (in > Excel!!). Oh the horror! > > The resulting table looks like this though and gives me what I want: > > father.id ? ? ?100 ? ? ?200 ? ? ?300 ? 340 > a ? ? ? ? ? ? ? ? ? 1 ? ? ? ? 1 ? ? ? ? 1 ? ? 0 > b ? ? ? ? ? ? ? ? ? 1 ? ? ? ? ?1 ? ? ? ? 0 ? ?1 > ..... > > This is incredibly cumbersome and therefore I want to find a way of doing > this completely in R script so that I have the code ready when I need it > again. > So far I have tried with several varieties of merge() and xtabs() but have > given up. I can not afford to spend more time on this and as a last resort > I'm hoping anyone here could pass me a tip. > I use R for modelling but I am so far finding it hard to wrap my head > around it when it comes to the pre-analysis data manipulation like this as > I am not proficient enough (i.e. do not quite understand) in the language > and data structure intricacies. > > Regards, > Jan > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
> ## Create a sample data. > data <- data.frame(father.id = letters[1:5],+ diagnosis = sample(c(100,200,300,340),5,replace=TRUE), + diagnosis1 = sample(c(100,200,300,340),5,replace=TRUE), + diagnosis2 = sample(c(100,200,300,340),5,replace=TRUE))> datafather.id diagnosis diagnosis1 diagnosis2 1 a 340 100 300 2 b 200 300 340 3 c 200 300 100 4 d 200 100 300 5 e 200 100 100> > ## Create a matrix by replicating the father.id column 3 times > ## Because we have 3 diagnosis columns to match when use table() function > match <- as.matrix(data[,rep(1:2,c(3,0))]) > matchfather.id father.id.1 father.id.2 [1,] "a" "a" "a" [2,] "b" "b" "b" [3,] "c" "c" "c" [4,] "d" "d" "d" [5,] "e" "e" "e"> > ## Count > table(match,as.matrix(data[,2:4]))match 100 200 300 340 a 1 0 1 1 b 0 1 1 1 c 1 1 1 0 d 1 1 1 0 e 2 1 0 0>Please refer to: http://r.789695.n4.nabble.com/Counting-Frequencies-in-Data-Frame-tt2221342.html#a2221487 http://r.789695.n4.nabble.com/Counting-Frequencies-in-Data-Frame-tt2221342.html#a2221487 ----- A R learner. -- View this message in context: http://r.789695.n4.nabble.com/Joining-two-or-more-frequency-tables-tp2222576p2222946.html Sent from the R help mailing list archive at Nabble.com.