thr3ads.net - R help - [R] Matching values between 2 data.frame. [May 2017]

If this information is useful, please help other people find it:
Share via:

Christofer Bogaso

2017-May-20 18:23 UTC

[R] Matching values between 2 data.frame.

Hi again,

Let say I have below 2 data frames.

OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22,
'AA1' c('AA4', 'AA3', 'AA4', 'AA1',
'AA2', 'AA1', 'AA6', 'AA6', 'AA3',
'AA3', 'AA4', 'AA3'), 'Value' = NA)

TargetValue = data.frame('AA' = c('AA1', 'AA2',
'AA3', 'AA4', 'AA5',
'AA6'), 'BB' = c('B', 'B', 'B',
'B', 'CC', 'CC'), 'Value' = c(5, 10,
25, 7, 35, 21))

OriginalData
TargetValue

Now I need to replace OriginalData's 'AA1' column with
TargetValue's
'BB' column, based on matched values between 'AA1' &
'AA' columns of
OriginalData & TargetValue respectively. With this same law, I need to
update 'Value' column of  OriginalData with that of TargetValue.

As an example, after replacement by above rule, 1st row of
OriginalData should look like :
> OriginalData
   Value1 Value2 AA1 Value

1       1     11 B    7

Values of TargetValue's 'AA' column are unique i.e. no duplication

Previously I have implemented a 'for' loop to implement above, however
since both of my data.frames are quite big, it is taking long time to
execute. Is there any 'R' way to implement this quickly.

Appreciate for any pointer.

Thanks,

David Winsemius

2017-May-20 18:53 UTC

head link

[R] Matching values between 2 data.frame.

> On May 20, 2017, at 11:23 AM, Christofer Bogaso <bogaso.christofer at
gmail.com> wrote:
> 
> Hi again,
> 
> Let say I have below 2 data frames.
> 
> OriginalData = data.frame('Value1' = 1:12, 'Value2' =
11:22, 'AA1' > c('AA4', 'AA3', 'AA4',
'AA1', 'AA2', 'AA1', 'AA6', 'AA6',
'AA3',
> 'AA3', 'AA4', 'AA3'), 'Value' = NA)
> 
> TargetValue = data.frame('AA' = c('AA1', 'AA2',
'AA3', 'AA4', 'AA5',
> 'AA6'), 'BB' = c('B', 'B', 'B',
'B', 'CC', 'CC'), 'Value' = c(5, 10,
> 25, 7, 35, 21))
> 
> OriginalData
> TargetValue
> 
> Now I need to replace OriginalData's 'AA1' column with
TargetValue's
> 'BB' column, based on matched values between 'AA1' &
'AA' columns of
> OriginalData & TargetValue respectively. With this same law, I need to
> update 'Value' column of  OriginalData with that of TargetValue.
> 
> As an example, after replacement by above rule, 1st row of
> OriginalData should look like :
> 
>> OriginalData
> 
>   Value1 Value2 AA1 Value
> 
> 1       1     11 B    7
> 
> Values of TargetValue's 'AA' column are unique i.e. no
duplication
> 
> Previously I have implemented a 'for' loop to implement above,
however
> since both of my data.frames are quite big, it is taking long time to
> execute. Is there any 'R' way to implement this quickly.
> 
> Appreciate for any pointer.
It's going to have a greater chance of delivering the desired result if you
convert the factor columns into character.
> 
> Thanks,
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

Bert Gunter

2017-May-20 20:13 UTC

head link

[R] Matching values between 2 data.frame.

Like this?  (use indexing to avoid explicit loops whenever possible):

## first convert factor columns to character, as David W. suggested
i <- sapply(od,is.factor)
od[i]<- lapply(od[i],as.character)
i <- sapply(tv, is.factor)
tv[i]<- lapply(tv[i],as.character)


## Now use ?match
wh  <-   match(od[,"AA1"], tv[,"AA"])
matched <- !is.na(wh)  ## only needed if not all AA1's match in AA
od[matched,c("AA1", "Value")] <-
tv[wh[matched],c("BB","Value")]
> od   Value1 Value2 AA1 Value
1       1     11   B     7
2       2     12   B    25
3       3     13   B     7
4       4     14   B     5
5       5     15   B    10
6       6     16   B     5
7       7     17  CC    21
8       8     18  CC    21
9       9     19   B    25
10     10     20   B    25
11     11     21   B     7
12     12     22   B    25


Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sat, May 20, 2017 at 11:53 AM, David Winsemius
<dwinsemius at comcast.net> wrote:>
>> On May 20, 2017, at 11:23 AM, Christofer Bogaso <bogaso.christofer
at gmail.com> wrote:
>>
>> Hi again,
>>
>> Let say I have below 2 data frames.
>>
>> OriginalData = data.frame('Value1' = 1:12, 'Value2' =
11:22, 'AA1' >> c('AA4', 'AA3', 'AA4',
'AA1', 'AA2', 'AA1', 'AA6', 'AA6',
'AA3',
>> 'AA3', 'AA4', 'AA3'), 'Value' = NA)
>>
>> TargetValue = data.frame('AA' = c('AA1', 'AA2',
'AA3', 'AA4', 'AA5',
>> 'AA6'), 'BB' = c('B', 'B', 'B',
'B', 'CC', 'CC'), 'Value' = c(5, 10,
>> 25, 7, 35, 21))
>>
>> OriginalData
>> TargetValue
>>
>> Now I need to replace OriginalData's 'AA1' column with
TargetValue's
>> 'BB' column, based on matched values between 'AA1'
& 'AA' columns of
>> OriginalData & TargetValue respectively. With this same law, I need
to
>> update 'Value' column of  OriginalData with that of
TargetValue.
>>
>> As an example, after replacement by above rule, 1st row of
>> OriginalData should look like :
>>
>>> OriginalData
>>
>>   Value1 Value2 AA1 Value
>>
>> 1       1     11 B    7
>>
>> Values of TargetValue's 'AA' column are unique i.e. no
duplication
>>
>> Previously I have implemented a 'for' loop to implement above,
however
>> since both of my data.frames are quite big, it is taking long time to
>> execute. Is there any 'R' way to implement this quickly.
>>
>> Appreciate for any pointer.
>
> It's going to have a greater chance of delivering the desired result if
you convert the factor columns into character.
>
>>
>> Thanks,
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

R help - May 2017 - Matching values between 2 data.frame.

[R] Matching values between 2 data.frame.

[R] Matching values between 2 data.frame.

[R] Matching values between 2 data.frame.