I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables). I would like to figure out how many times each id number appears in each dataframe. So far I can use: length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) but this only works on each row of dataframe A one-at-a-time. I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count I'm new to R, so please be patient with me! Sorry if this question has already been answered, my search of the archives only brought up one relevant post, and I didn't understand the answer to it.... http://www.nabble.com/match-to20799206.html#a20799206 thx -- View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24393051.html Sent from the R help mailing list archive at Nabble.com.
>From an email suggestion, here are two sample datasets, and my ideal output:dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2) dataB <- data.frame(unique.id=c("A","B","A","B","A","C","D","A"),x=27:20,y=22:29) ## mystery operation(s) happen here.... ## ideal output would be: dataA <- data.frame(unique.id=c("A","B","C","B"),x=11:14,y=5:2,countA=c(1,2,1,2),countB=c(4,2,1,2)) so my mystery operation(s) would count the number of times the unique id shows up in a given dataset. my ideal outputs are as follows: countA is the "mystery operation" applied to dataA (counting occurrences within the same dataset) countB is applied to dataB (counting occurrences within a second dataset). My best try so far is to do: tempA <- aggregate(dataA$unique.id,list(dataA$unique.id),length) which gives me a matrix with ONE instance of each unique.id and the counts... (and which I thought was kinda cute) but it only works for within a single dataset! tathta wrote:> > I have two dataframes, the first column of each dataframe is a unique id > number (the rest of the columns are data variables). > I would like to figure out how many times each id number appears in each > dataframe. > > So far I can use: > length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) > > but this only works on each row of dataframe A one-at-a-time. > > I would like to do this for all of the rows in dataframe A, and then put > the results in a new variable: dataframeA$count > > > I'm new to R, so please be patient with me! > > > > thx >-- View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24395711.html Sent from the R help mailing list archive at Nabble.com.
Something like this? > dataframeA <- data.frame ( + unique.id= c(1,1,3,3,3,5,7,7, 9) + , x1=rnorm(9) + , x2=rnorm(9) + , x3=rnorm(9) + ) > dataframeB <- data.frame ( + unique.id= c(2,3,4,5,5,5,6,7,9,10,10) + , x4=rnorm(11) + , x5=rnorm(11) + , x6=rnorm(11) + ) > match.counts <- function ( x , y ) { + out <- cbind ( + table ( x [ which ( x %in% y ) ] ) + , table ( y [ which ( y %in% x ) ] ) + ) + dimnames ( out ) [[2]] <- c ( "N in x" , "N in y" ) + out + } > match.counts ( dataframeA$unique.id , dataframeB$unique.id ) N in x N in y 3 3 1 5 1 3 7 2 1 9 1 1 > -- David ? ----------------------------------------------------- David Huffer, Ph.D. Senior Statistician CSOSA/Washington, DC david.huffer at csosa.gov ----------------------------------------------------- -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of tathta Sent: Wednesday, July 08, 2009 11:10 AM To: r-help at r-project.org Subject: [R] matching each row I have two dataframes, the first column of each dataframe is a unique id number (the rest of the columns are data variables). I would like to figure out how many times each id number appears in each dataframe. So far I can use: length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) but this only works on each row of dataframe A one-at-a-time. I would like to do this for all of the rows in dataframe A, and then put the results in a new variable: dataframeA$count I'm new to R, so please be patient with me! Sorry if this question has already been answered, my search of the archives only brought up one relevant post, and I didn't understand the answer to it.... http://www.nabble.com/match-to20799206.html#a20799206 thx -- View this message in context: http://www.nabble.com/matching-each-row-tp24393051p24393051.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
On Jul 8, 2009, at 10:09 AM, tathta wrote:> > I have two dataframes, the first column of each dataframe is a > unique id > number (the rest of the columns are data variables). > I would like to figure out how many times each id number appears in > each > dataframe. > > So far I can use: > length( match (dataframeA$unique.id[1], dataframeB$unique.id) ) > > but this only works on each row of dataframe A one-at-a-time. > > I would like to do this for all of the rows in dataframe A, and then > put the > results in a new variable: dataframeA$count > > > I'm new to R, so please be patient with me! > > > Sorry if this question has already been answered, my search of the > archives > only brought up one relevant post, and I didn't understand the > answer to > it.... http://www.nabble.com/match-to20799206.html#a20799206If I am correctly understanding what you are looking for, you could do something like the following: # Create some simple data. Note that only a subset of the ID's (3:5) will match across the two DF's: set.seed(1) DF.A <- data.frame(ID = sample(1:5, 10, replace = TRUE)) DF.B <- data.frame(ID = sample(3:7, 10, replace = TRUE)) > DF.A ID 1 2 2 2 3 3 4 5 5 2 6 5 7 5 8 4 9 4 10 1 > DF.B ID 1 4 2 3 3 6 4 4 5 6 6 5 7 6 8 7 9 4 10 6 Now, create counts of the IDs in each, coercing the results to data frames and setting the count column name for each: TAB.A <- as.data.frame(table(DF.A$ID), responseName = "Count.A") TAB.B <- as.data.frame(table(DF.B$ID), responseName = "Count.B") > TAB.A Var1 Count.A 1 1 1 2 2 3 3 3 1 4 4 2 5 5 3 > TAB.B Var1 Count.B 1 3 1 2 4 3 3 5 1 4 6 4 5 7 1 Now, use merge() to join each of the two above. 'all = TRUE' will include non-matching keys: > merge(TAB.A, TAB.B, by = "Var1", all = TRUE) Var1 Count.A Count.B 1 1 1 NA 2 2 3 NA 3 3 1 1 4 4 2 3 5 5 3 1 6 6 NA 4 7 7 NA 1 Note that you will get NAs for any non-matching ID's (Var1). See ?table, ?as.data.frame and ?merge for more information. HTH, Marc Schwartz