Thanks - Peter, Eivind, Rui Sorry, I perhaps could not explain it properly in the first go. Trying to simplify it here with an example - Say I have two dataframes as below that are NOT equally-sized data frames (i.e., number of columns are different in each table): Table_A: Email Name Phone abc at gmail.com John Chan 0909 bcd at yahoo.com Tim Ma 89089 ...... Table_B: Email Name Sex Phone abc at gmail.com John Chan M 0909 khn at hotmail.com Rosy Kim F 7779 ..... Now, I have used - merge (Table_A, Table_B, by="Email", all = FALSE)) - to find only the rows that match from these data frames - based on Email as primary key. Further, I am also interested (using "Email" as the common key) which rows from Table_A did not match with Table_B. I am not sure how to do this here. Thanks and regards, Chintanu On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Is it something like this that you want? > > x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10)) > y <- data.frame(a = 1:10, b = 1:10) > > which(x != y, arr.ind = TRUE) > > > Hope this helps, > > Rui Barradas > > > On 5/1/2018 11:35 AM, Chintanu wrote: > >> Hi, >> >> >> May I please ask how I do the following in R. Sorry - this may be trivial, >> but I am struggling here for this. >> >> >> >> For two dataframes (A and B), I wish to identify (based on a primary >> key-column present in both A & B) - >> >> 1. Which records (rows) of A did not match with B, and >> >> >> >> 2. Which records of B did not match with A ? >> >> >> >> I came across a setdt function while browsing, but when I tried it, it >> says >> - Could not find function "setdt". >> >> >> >> Overall, if there is any way of doing it (preferably in some simplified >> way), please advise. >> >> >> Many thanks in advance. >> >> >> regards, >> >> Tito >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>[[alternative HTML version deleted]]
Hi, I'll coded your example into R code: Table_A <- c('abc at gmail.com', 'John Chan', '0909') Table_A <- rbind(Table_A, c('bcd at yahoo.com', 'Tim Ma', '89089')) colnames(Table_A) <- c('Email', 'Name', 'Phone') Table_A Table_B <- c('abc at gmail.com', 'John Chan', 'M', '0909') Table_B <- rbind(Table_B, c('khn at hotmail.com', 'Rosy Kim', 'F', '7779')) colnames(Table_B) <- c('Email', 'Name', 'Sex', 'Phone') Table_B Did you have a look at this one? Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE) Table_C[is.na(Table_C$Name.y),] Table_C[is.na(Table_C$Name.x),] Table_C contains all data from Table_A and Table_B. The key.x is NA if the row comes from Table_B and key.y is NA if the row comes from Table_A. Best, Robin On 05/02/2018 11:38 AM, Chintanu wrote:> Thanks - Peter, Eivind, Rui > > > Sorry, I perhaps could not explain it properly in the first go. > > Trying to simplify it here with an example - Say I have two dataframes as > below that are NOT equally-sized data frames (i.e., number of columns are > different in each table): > > > > Table_A: > > Email Name Phone > > abc at gmail.com John Chan 0909 > > bcd at yahoo.com Tim Ma 89089 > > ...... > > > > Table_B: > > Email Name Sex Phone > > abc at gmail.com John Chan M 0909 > > khn at hotmail.com Rosy Kim F 7779 > > ..... > > > > Now, I have used - > > merge (Table_A, Table_B, by="Email", all = FALSE)) > > > > - to find only the rows that match from these data frames - based on Email > as primary key. > > > > Further, I am also interested (using "Email" as the common key) which rows > from Table_A did not match with Table_B. > > I am not sure how to do this here. > > Thanks and regards, > Chintanu > > > > On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote: > >> Hello, >> >> Is it something like this that you want? >> >> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10)) >> y <- data.frame(a = 1:10, b = 1:10) >> >> which(x != y, arr.ind = TRUE) >> >> >> Hope this helps, >> >> Rui Barradas >> >> >> On 5/1/2018 11:35 AM, Chintanu wrote: >> >>> Hi, >>> >>> >>> May I please ask how I do the following in R. Sorry - this may be trivial, >>> but I am struggling here for this. >>> >>> >>> >>> For two dataframes (A and B), I wish to identify (based on a primary >>> key-column present in both A & B) - >>> >>> 1. Which records (rows) of A did not match with B, and >>> >>> >>> >>> 2. Which records of B did not match with A ? >>> >>> >>> >>> I came across a setdt function while browsing, but when I tried it, it >>> says >>> - Could not find function "setdt". >>> >>> >>> >>> Overall, if there is any way of doing it (preferably in some simplified >>> way), please advise. >>> >>> >>> Many thanks in advance. >>> >>> >>> regards, >>> >>> Tito >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posti >>> ng-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Dr. Robin Haunschild Max Planck Institute for Solid State Research Heisenbergstr. 1 D-70569 Stuttgart (Germany) phone: +49 (0) 711-689-1285 fax: +49 (0) 711-689-1292 email: R.Haunschild at fkf.mpg.de http://www.fkf.mpg.de/ivs
Have a look at anti_join() from the dplyr package. It does exactly what you want. Here is an example based on the code of Robin Table_A <- as.data.frame(Table_A, stringsAsFactors = FALSE)That is Table_B <- as.data.frame(Table_B, stringsAsFactors = FALSE) library(dplyr) anti_join(Table_A, Table_B, by = "Email") anti_join(Table_B, Table_A, by = "Email") Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// 2018-05-02 13:23 GMT+02:00 Dr. Robin Haunschild <R.Haunschild at fkf.mpg.de>:> Hi, > > I'll coded your example into R code: > > Table_A <- c('abc at gmail.com', 'John Chan', '0909') > Table_A <- rbind(Table_A, c('bcd at yahoo.com', 'Tim Ma', '89089')) > colnames(Table_A) <- c('Email', 'Name', 'Phone') > Table_A > > Table_B <- c('abc at gmail.com', 'John Chan', 'M', '0909') > Table_B <- rbind(Table_B, c('khn at hotmail.com', 'Rosy Kim', 'F', '7779')) > colnames(Table_B) <- c('Email', 'Name', 'Sex', 'Phone') > Table_B > > Did you have a look at this one? > Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE) > Table_C[is.na(Table_C$Name.y),] > Table_C[is.na(Table_C$Name.x),] > > Table_C contains all data from Table_A and Table_B. The key.x is NA if > the row comes from Table_B and key.y is NA if the row comes from Table_A. > > Best, Robin > > > On 05/02/2018 11:38 AM, Chintanu wrote: >> Thanks - Peter, Eivind, Rui >> >> >> Sorry, I perhaps could not explain it properly in the first go. >> >> Trying to simplify it here with an example - Say I have two dataframes as >> below that are NOT equally-sized data frames (i.e., number of columns are >> different in each table): >> >> >> >> Table_A: >> >> Email Name Phone >> >> abc at gmail.com John Chan 0909 >> >> bcd at yahoo.com Tim Ma 89089 >> >> ...... >> >> >> >> Table_B: >> >> Email Name Sex Phone >> >> abc at gmail.com John Chan M 0909 >> >> khn at hotmail.com Rosy Kim F 7779 >> >> ..... >> >> >> >> Now, I have used - >> >> merge (Table_A, Table_B, by="Email", all = FALSE)) >> >> >> >> - to find only the rows that match from these data frames - based on Email >> as primary key. >> >> >> >> Further, I am also interested (using "Email" as the common key) which rows >> from Table_A did not match with Table_B. >> >> I am not sure how to do this here. >> >> Thanks and regards, >> Chintanu >> >> >> >> On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote: >> >>> Hello, >>> >>> Is it something like this that you want? >>> >>> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10)) >>> y <- data.frame(a = 1:10, b = 1:10) >>> >>> which(x != y, arr.ind = TRUE) >>> >>> >>> Hope this helps, >>> >>> Rui Barradas >>> >>> >>> On 5/1/2018 11:35 AM, Chintanu wrote: >>> >>>> Hi, >>>> >>>> >>>> May I please ask how I do the following in R. Sorry - this may be trivial, >>>> but I am struggling here for this. >>>> >>>> >>>> >>>> For two dataframes (A and B), I wish to identify (based on a primary >>>> key-column present in both A & B) - >>>> >>>> 1. Which records (rows) of A did not match with B, and >>>> >>>> >>>> >>>> 2. Which records of B did not match with A ? >>>> >>>> >>>> >>>> I came across a setdt function while browsing, but when I tried it, it >>>> says >>>> - Could not find function "setdt". >>>> >>>> >>>> >>>> Overall, if there is any way of doing it (preferably in some simplified >>>> way), please advise. >>>> >>>> >>>> Many thanks in advance. >>>> >>>> >>>> regards, >>>> >>>> Tito >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>> ng-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > Dr. Robin Haunschild > Max Planck Institute > for Solid State Research > Heisenbergstr. 1 > D-70569 Stuttgart (Germany) > phone: +49 (0) 711-689-1285 > fax: +49 (0) 711-689-1292 > email: R.Haunschild at fkf.mpg.de > http://www.fkf.mpg.de/ivs > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >