thr3ads.net - R help - [R] Merging dataframes [May 2018]

If this information is useful, please help other people find it:
Share via:

Chintanu

2018-May-02 09:38 UTC

[R] Merging dataframes

Thanks - Peter, Eivind, Rui


Sorry, I perhaps could not explain it properly in the first go.

Trying to simplify it here with an example - Say I have two dataframes as
below that are NOT equally-sized data frames (i.e., number of columns are
different in each table):



Table_A:

Email             Name                   Phone

abc at gmail.com   John Chan         0909

bcd at yahoo.com   Tim Ma                    89089

......



Table_B:

Email                     Name                 Sex        Phone

abc at gmail.com    John Chan            M                 0909

khn at hotmail.com           Rosy  Kim               F                   7779

.....



Now, I have used -

merge (Table_A, Table_B, by="Email", all = FALSE))



- to find only the rows that match from these data frames - based on Email
as primary key.



Further, I am also interested (using "Email" as the common key) which
rows
from Table_A did not match with Table_B.

I am not sure how to do this here.

Thanks and regards,
Chintanu



On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> Is it something like this that you want?
>
> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
> y <- data.frame(a = 1:10, b = 1:10)
>
> which(x != y, arr.ind = TRUE)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 5/1/2018 11:35 AM, Chintanu wrote:
>
>> Hi,
>>
>>
>> May I please ask how I do the following in R. Sorry - this may be
trivial,
>> but I am struggling here for this.
>>
>>
>>
>> For two dataframes (A and B), I wish to identify (based on a primary
>> key-column present in both A & B) -
>>
>> 1. Which records (rows) of A did not match with B, and
>>
>>
>>
>> 2. Which records of B did not match with A ?
>>
>>
>>
>> I came across a setdt function while browsing, but when I tried it, it
>> says
>> - Could not find function "setdt".
>>
>>
>>
>> Overall, if there is any way of doing it (preferably in some simplified
>> way), please advise.
>>
>>
>> Many thanks in advance.
>>
>>
>> regards,
>>
>> Tito
>>
>>         [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
	[[alternative HTML version deleted]]

Dr. Robin Haunschild

2018-May-02 11:23 UTC

head link

[R] Merging dataframes

Hi,

I'll coded your example into R code:

Table_A <- c('abc at gmail.com', 'John Chan', '0909')
Table_A <- rbind(Table_A, c('bcd at yahoo.com', 'Tim Ma',
'89089'))
colnames(Table_A) <- c('Email', 'Name', 'Phone')
Table_A

Table_B <- c('abc at gmail.com', 'John Chan', 'M',
'0909')
Table_B <- rbind(Table_B, c('khn at hotmail.com', 'Rosy 
Kim', 'F', '7779'))
colnames(Table_B) <- c('Email', 'Name', 'Sex',
'Phone')
Table_B

Did you have a look at this one?
Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE)
Table_C[is.na(Table_C$Name.y),]
Table_C[is.na(Table_C$Name.x),]

Table_C contains all data from Table_A and Table_B. The key.x is NA if
the row comes from Table_B and key.y is NA if the row comes from Table_A.

Best, Robin


On 05/02/2018 11:38 AM, Chintanu wrote:> Thanks - Peter, Eivind, Rui
> 
> 
> Sorry, I perhaps could not explain it properly in the first go.
> 
> Trying to simplify it here with an example - Say I have two dataframes as
> below that are NOT equally-sized data frames (i.e., number of columns are
> different in each table):
> 
> 
> 
> Table_A:
> 
> Email             Name                   Phone
> 
> abc at gmail.com   John Chan         0909
> 
> bcd at yahoo.com   Tim Ma                    89089
> 
> ......
> 
> 
> 
> Table_B:
> 
> Email                     Name                 Sex        Phone
> 
> abc at gmail.com    John Chan            M                 0909
> 
> khn at hotmail.com           Rosy  Kim               F                  
7779
> 
> .....
> 
> 
> 
> Now, I have used -
> 
> merge (Table_A, Table_B, by="Email", all = FALSE))
> 
> 
> 
> - to find only the rows that match from these data frames - based on Email
> as primary key.
> 
> 
> 
> Further, I am also interested (using "Email" as the common key)
which rows
> from Table_A did not match with Table_B.
> 
> I am not sure how to do this here.
> 
> Thanks and regards,
> Chintanu
> 
> 
> 
> On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at
sapo.pt> wrote:
> 
>> Hello,
>>
>> Is it something like this that you want?
>>
>> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
>> y <- data.frame(a = 1:10, b = 1:10)
>>
>> which(x != y, arr.ind = TRUE)
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>>
>> On 5/1/2018 11:35 AM, Chintanu wrote:
>>
>>> Hi,
>>>
>>>
>>> May I please ask how I do the following in R. Sorry - this may be
trivial,
>>> but I am struggling here for this.
>>>
>>>
>>>
>>> For two dataframes (A and B), I wish to identify (based on a
primary
>>> key-column present in both A & B) -
>>>
>>> 1. Which records (rows) of A did not match with B, and
>>>
>>>
>>>
>>> 2. Which records of B did not match with A ?
>>>
>>>
>>>
>>> I came across a setdt function while browsing, but when I tried it,
it
>>> says
>>> - Could not find function "setdt".
>>>
>>>
>>>
>>> Overall, if there is any way of doing it (preferably in some
simplified
>>> way), please advise.
>>>
>>>
>>> Many thanks in advance.
>>>
>>>
>>> regards,
>>>
>>> Tito
>>>
>>>         [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>> ng-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
-- 
Dr. Robin Haunschild
Max Planck Institute
for Solid State Research
Heisenbergstr. 1
D-70569 Stuttgart (Germany)
phone: +49 (0) 711-689-1285
fax:   +49 (0) 711-689-1292
email: R.Haunschild at fkf.mpg.de
http://www.fkf.mpg.de/ivs

Thierry Onkelinx

2018-May-02 12:01 UTC

head link

[R] Merging dataframes

Have a look at anti_join() from the dplyr package. It does exactly
what you want. Here is an example based on the code of Robin

Table_A <- as.data.frame(Table_A, stringsAsFactors = FALSE)That is
Table_B <- as.data.frame(Table_B, stringsAsFactors = FALSE)
library(dplyr)
anti_join(Table_A, Table_B, by = "Email")
anti_join(Table_B, Table_A, by = "Email")

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE
AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkelinx at inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no
more than asking him to perform a post-mortem examination: he may be
able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does
not ensure that a reasonable answer can be extracted from a given body
of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




2018-05-02 13:23 GMT+02:00 Dr. Robin Haunschild <R.Haunschild at
fkf.mpg.de>:> Hi,
>
> I'll coded your example into R code:
>
> Table_A <- c('abc at gmail.com', 'John Chan',
'0909')
> Table_A <- rbind(Table_A, c('bcd at yahoo.com', 'Tim
Ma', '89089'))
> colnames(Table_A) <- c('Email', 'Name', 'Phone')
> Table_A
>
> Table_B <- c('abc at gmail.com', 'John Chan',
'M', '0909')
> Table_B <- rbind(Table_B, c('khn at hotmail.com', 'Rosy 
Kim', 'F', '7779'))
> colnames(Table_B) <- c('Email', 'Name', 'Sex',
'Phone')
> Table_B
>
> Did you have a look at this one?
> Table_C <- merge (Table_A, Table_B, by="Email", all = TRUE)
> Table_C[is.na(Table_C$Name.y),]
> Table_C[is.na(Table_C$Name.x),]
>
> Table_C contains all data from Table_A and Table_B. The key.x is NA if
> the row comes from Table_B and key.y is NA if the row comes from Table_A.
>
> Best, Robin
>
>
> On 05/02/2018 11:38 AM, Chintanu wrote:
>> Thanks - Peter, Eivind, Rui
>>
>>
>> Sorry, I perhaps could not explain it properly in the first go.
>>
>> Trying to simplify it here with an example - Say I have two dataframes
as
>> below that are NOT equally-sized data frames (i.e., number of columns
are
>> different in each table):
>>
>>
>>
>> Table_A:
>>
>> Email             Name                   Phone
>>
>> abc at gmail.com   John Chan         0909
>>
>> bcd at yahoo.com   Tim Ma                    89089
>>
>> ......
>>
>>
>>
>> Table_B:
>>
>> Email                     Name                 Sex        Phone
>>
>> abc at gmail.com    John Chan            M                 0909
>>
>> khn at hotmail.com           Rosy  Kim               F                 
7779
>>
>> .....
>>
>>
>>
>> Now, I have used -
>>
>> merge (Table_A, Table_B, by="Email", all = FALSE))
>>
>>
>>
>> - to find only the rows that match from these data frames - based on
Email
>> as primary key.
>>
>>
>>
>> Further, I am also interested (using "Email" as the common
key) which rows
>> from Table_A did not match with Table_B.
>>
>> I am not sure how to do this here.
>>
>> Thanks and regards,
>> Chintanu
>>
>>
>>
>> On Tue, May 1, 2018 at 8:48 PM, Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>>
>>> Hello,
>>>
>>> Is it something like this that you want?
>>>
>>> x <- data.frame(a = c(1:3, 5, 5:10), b = c(1:7, 7, 9:10))
>>> y <- data.frame(a = 1:10, b = 1:10)
>>>
>>> which(x != y, arr.ind = TRUE)
>>>
>>>
>>> Hope this helps,
>>>
>>> Rui Barradas
>>>
>>>
>>> On 5/1/2018 11:35 AM, Chintanu wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> May I please ask how I do the following in R. Sorry - this may
be trivial,
>>>> but I am struggling here for this.
>>>>
>>>>
>>>>
>>>> For two dataframes (A and B), I wish to identify (based on a
primary
>>>> key-column present in both A & B) -
>>>>
>>>> 1. Which records (rows) of A did not match with B, and
>>>>
>>>>
>>>>
>>>> 2. Which records of B did not match with A ?
>>>>
>>>>
>>>>
>>>> I came across a setdt function while browsing, but when I tried
it, it
>>>> says
>>>> - Could not find function "setdt".
>>>>
>>>>
>>>>
>>>> Overall, if there is any way of doing it (preferably in some
simplified
>>>> way), please advise.
>>>>
>>>>
>>>> Many thanks in advance.
>>>>
>>>>
>>>> regards,
>>>>
>>>> Tito
>>>>
>>>>         [[alternative HTML version deleted]]
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide http://www.R-project.org/posti
>>>> ng-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>>
>>>>
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> --
> Dr. Robin Haunschild
> Max Planck Institute
> for Solid State Research
> Heisenbergstr. 1
> D-70569 Stuttgart (Germany)
> phone: +49 (0) 711-689-1285
> fax:   +49 (0) 711-689-1292
> email: R.Haunschild at fkf.mpg.de
> http://www.fkf.mpg.de/ivs
>
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more possibly parallel threads

R help - May 2018 - Merging dataframes

[R] Merging dataframes

[R] Merging dataframes

[R] Merging dataframes

Apparently Analagous Threads