Thanks - Peter, Eivind, Rui Sorry, I perhaps could not explain it properly in the first go. Trying to simplify it here with an example - Say I have two dataframes as below that are NOT equally-sized data frames (i.e., number of columns are different in each table): Table_A: Email Name Phone abc at gmail.com John Chan 0909 bcd at yahoo.com Tim Ma 89089 ...... Table_B: Email Name Sex Phone abc at gmail.com John Chan M 0909 khn at hotmail.com Rosy Kim F 7779 ..... Now, I have used - merge (Table_A, Table_B, by="Email", all = FALSE)) - to find only the rows that match from these data frames - based on Email as primary key. Further, I am also interested (using "Email" as the common key) which rows from Table_A did not match with Table_B. I am not sure how to do this here. Thanks and regards, Chintanu On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Is it something like this that you want? > > x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10)) > y <- data.frame(a = 1:10, b = 1:10) > > which(x != y, arr.ind = TRUE) > > > Hope this helps, > > Rui Barradas > > > On 5/1/2018 11:35 AM, Chintanu wrote: > >> Hi, >> >> >> May I please ask how I do the following in R. Sorry - this may be trivial, >> but I am struggling here for this. >> >> >> >> For two dataframes (A and B), I wish to identify (based on a primary >> key-column present in both A & B) - >> >> 1. Which records (rows) of A did not match with B, and >> >> >> >> 2. Which records of B did not match with A ? >> >> >> >> I came across a setdt function while browsing, but when I tried it, it >> says >> - Could not find function "setdt". >> >> >> >> Overall, if there is any way of doing it (preferably in some simplified >> way), please advise. >> >> >> Many thanks in advance. >> >> >> regards, >> >> Tito >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>[[alternative HTML version deleted]]
Hi,
I'll coded your example into R code:
Table_A <- c('abc at gmail.com', 'John Chan', '0909')
Table_A <- rbind(Table_A, c('bcd at yahoo.com', 'Tim Ma',
'89089'))
colnames(Table_A) <- c('Email', 'Name', 'Phone')
Table_A
Table_B <- c('abc at gmail.com', 'John Chan', 'M',
'0909')
Table_B <- rbind(Table_B, c('khn at hotmail.com', 'Rosy
Kim', 'F', '7779'))
colnames(Table_B) <- c('Email', 'Name', 'Sex',
'Phone')
Table_B
Did you have a look at this one?
Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE)
Table_C[is.na(Table_C$Name.y),]
Table_C[is.na(Table_C$Name.x),]
Table_C contains all data from Table_A and Table_B. The key.x is NA if
the row comes from Table_B and key.y is NA if the row comes from Table_A.
Best, Robin
On 05/02/2018 11:38 AM, Chintanu wrote:> Thanks - Peter, Eivind, Rui
>
>
> Sorry, I perhaps could not explain it properly in the first go.
>
> Trying to simplify it here with an example - Say I have two dataframes as
> below that are NOT equally-sized data frames (i.e., number of columns are
> different in each table):
>
>
>
> Table_A:
>
> Email Name Phone
>
> abc at gmail.com John Chan 0909
>
> bcd at yahoo.com Tim Ma 89089
>
> ......
>
>
>
> Table_B:
>
> Email Name Sex Phone
>
> abc at gmail.com John Chan M 0909
>
> khn at hotmail.com Rosy Kim F
7779
>
> .....
>
>
>
> Now, I have used -
>
> merge (Table_A, Table_B, by="Email", all = FALSE))
>
>
>
> - to find only the rows that match from these data frames - based on Email
> as primary key.
>
>
>
> Further, I am also interested (using "Email" as the common key)
which rows
> from Table_A did not match with Table_B.
>
> I am not sure how to do this here.
>
> Thanks and regards,
> Chintanu
>
>
>
> On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>
>> Hello,
>>
>> Is it something like this that you want?
>>
>> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
>> y <- data.frame(a = 1:10, b = 1:10)
>>
>> which(x != y, arr.ind = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> On 5/1/2018 11:35 AM, Chintanu wrote:
>>
>>> Hi,
>>>
>>>
>>> May I please ask how I do the following in R. Sorry - this may be
trivial,
>>> but I am struggling here for this.
>>>
>>>
>>>
>>> For two dataframes (A and B), I wish to identify (based on a
primary
>>> key-column present in both A & B) -
>>>
>>> 1. Which records (rows) of A did not match with B, and
>>>
>>>
>>>
>>> 2. Which records of B did not match with A ?
>>>
>>>
>>>
>>> I came across a setdt function while browsing, but when I tried it,
it
>>> says
>>> - Could not find function "setdt".
>>>
>>>
>>>
>>> Overall, if there is any way of doing it (preferably in some
simplified
>>> way), please advise.
>>>
>>>
>>> Many thanks in advance.
>>>
>>>
>>> regards,
>>>
>>> Tito
>>>
>>> [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Dr. Robin Haunschild
Max Planck Institute
for Solid State Research
Heisenbergstr. 1
D-70569 Stuttgart (Germany)
phone: +49 (0) 711-689-1285
fax: +49 (0) 711-689-1292
email: R.Haunschild at fkf.mpg.de
http://www.fkf.mpg.de/ivs
Have a look at anti_join() from the dplyr package. It does exactly what you want. Here is an example based on the code of Robin Table_A <- as.data.frame(Table_A, stringsAsFactors = FALSE)That is Table_B <- as.data.frame(Table_B, stringsAsFactors = FALSE) library(dplyr) anti_join(Table_A, Table_B, by = "Email") anti_join(Table_B, Table_A, by = "Email") Best regards, ir. Thierry Onkelinx Statisticus / Statistician Vlaamse Overheid / Government of Flanders INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance thierry.onkelinx at inbo.be Havenlaan 88 bus 73, 1000 Brussel www.inbo.be /////////////////////////////////////////////////////////////////////////////////////////// To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher The plural of anecdote is not data. ~ Roger Brinner The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey /////////////////////////////////////////////////////////////////////////////////////////// 2018-05-02 13:23 GMT+02:00 Dr. Robin Haunschild <R.Haunschild at fkf.mpg.de>:> Hi, > > I'll coded your example into R code: > > Table_A <- c('abc at gmail.com', 'John Chan', '0909') > Table_A <- rbind(Table_A, c('bcd at yahoo.com', 'Tim Ma', '89089')) > colnames(Table_A) <- c('Email', 'Name', 'Phone') > Table_A > > Table_B <- c('abc at gmail.com', 'John Chan', 'M', '0909') > Table_B <- rbind(Table_B, c('khn at hotmail.com', 'Rosy Kim', 'F', '7779')) > colnames(Table_B) <- c('Email', 'Name', 'Sex', 'Phone') > Table_B > > Did you have a look at this one? > Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE) > Table_C[is.na(Table_C$Name.y),] > Table_C[is.na(Table_C$Name.x),] > > Table_C contains all data from Table_A and Table_B. The key.x is NA if > the row comes from Table_B and key.y is NA if the row comes from Table_A. > > Best, Robin > > > On 05/02/2018 11:38 AM, Chintanu wrote: >> Thanks - Peter, Eivind, Rui >> >> >> Sorry, I perhaps could not explain it properly in the first go. >> >> Trying to simplify it here with an example - Say I have two dataframes as >> below that are NOT equally-sized data frames (i.e., number of columns are >> different in each table): >> >> >> >> Table_A: >> >> Email Name Phone >> >> abc at gmail.com John Chan 0909 >> >> bcd at yahoo.com Tim Ma 89089 >> >> ...... >> >> >> >> Table_B: >> >> Email Name Sex Phone >> >> abc at gmail.com John Chan M 0909 >> >> khn at hotmail.com Rosy Kim F 7779 >> >> ..... >> >> >> >> Now, I have used - >> >> merge (Table_A, Table_B, by="Email", all = FALSE)) >> >> >> >> - to find only the rows that match from these data frames - based on Email >> as primary key. >> >> >> >> Further, I am also interested (using "Email" as the common key) which rows >> from Table_A did not match with Table_B. >> >> I am not sure how to do this here. >> >> Thanks and regards, >> Chintanu >> >> >> >> On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote: >> >>> Hello, >>> >>> Is it something like this that you want? >>> >>> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10)) >>> y <- data.frame(a = 1:10, b = 1:10) >>> >>> which(x != y, arr.ind = TRUE) >>> >>> >>> Hope this helps, >>> >>> Rui Barradas >>> >>> >>> On 5/1/2018 11:35 AM, Chintanu wrote: >>> >>>> Hi, >>>> >>>> >>>> May I please ask how I do the following in R. Sorry - this may be trivial, >>>> but I am struggling here for this. >>>> >>>> >>>> >>>> For two dataframes (A and B), I wish to identify (based on a primary >>>> key-column present in both A & B) - >>>> >>>> 1. Which records (rows) of A did not match with B, and >>>> >>>> >>>> >>>> 2. Which records of B did not match with A ? >>>> >>>> >>>> >>>> I came across a setdt function while browsing, but when I tried it, it >>>> says >>>> - Could not find function "setdt". >>>> >>>> >>>> >>>> Overall, if there is any way of doing it (preferably in some simplified >>>> way), please advise. >>>> >>>> >>>> Many thanks in advance. >>>> >>>> >>>> regards, >>>> >>>> Tito >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posti >>>> ng-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > Dr. Robin Haunschild > Max Planck Institute > for Solid State Research > Heisenbergstr. 1 > D-70569 Stuttgart (Germany) > phone: +49 (0) 711-689-1285 > fax: +49 (0) 711-689-1292 > email: R.Haunschild at fkf.mpg.de > http://www.fkf.mpg.de/ivs > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >