Hi again,
Let say I have below 2 data frames.
OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22,
'AA1' c('AA4', 'AA3', 'AA4', 'AA1',
'AA2', 'AA1', 'AA6', 'AA6', 'AA3',
'AA3', 'AA4', 'AA3'), 'Value' = NA)
TargetValue = data.frame('AA' = c('AA1', 'AA2',
'AA3', 'AA4', 'AA5',
'AA6'), 'BB' = c('B', 'B', 'B',
'B', 'CC', 'CC'), 'Value' = c(5, 10,
25, 7, 35, 21))
OriginalData
TargetValue
Now I need to replace OriginalData's 'AA1' column with
TargetValue's
'BB' column, based on matched values between 'AA1' &
'AA' columns of
OriginalData & TargetValue respectively. With this same law, I need to
update 'Value' column of OriginalData with that of TargetValue.
As an example, after replacement by above rule, 1st row of
OriginalData should look like :
> OriginalData
Value1 Value2 AA1 Value
1 1 11 B 7
Values of TargetValue's 'AA' column are unique i.e. no duplication
Previously I have implemented a 'for' loop to implement above, however
since both of my data.frames are quite big, it is taking long time to
execute. Is there any 'R' way to implement this quickly.
Appreciate for any pointer.
Thanks,
> On May 20, 2017, at 11:23 AM, Christofer Bogaso <bogaso.christofer at gmail.com> wrote: > > Hi again, > > Let say I have below 2 data frames. > > OriginalData = data.frame('Value1' = 1:12, 'Value2' = 11:22, 'AA1' > c('AA4', 'AA3', 'AA4', 'AA1', 'AA2', 'AA1', 'AA6', 'AA6', 'AA3', > 'AA3', 'AA4', 'AA3'), 'Value' = NA) > > TargetValue = data.frame('AA' = c('AA1', 'AA2', 'AA3', 'AA4', 'AA5', > 'AA6'), 'BB' = c('B', 'B', 'B', 'B', 'CC', 'CC'), 'Value' = c(5, 10, > 25, 7, 35, 21)) > > OriginalData > TargetValue > > Now I need to replace OriginalData's 'AA1' column with TargetValue's > 'BB' column, based on matched values between 'AA1' & 'AA' columns of > OriginalData & TargetValue respectively. With this same law, I need to > update 'Value' column of OriginalData with that of TargetValue. > > As an example, after replacement by above rule, 1st row of > OriginalData should look like : > >> OriginalData > > Value1 Value2 AA1 Value > > 1 1 11 B 7 > > Values of TargetValue's 'AA' column are unique i.e. no duplication > > Previously I have implemented a 'for' loop to implement above, however > since both of my data.frames are quite big, it is taking long time to > execute. Is there any 'R' way to implement this quickly. > > Appreciate for any pointer.It's going to have a greater chance of delivering the desired result if you convert the factor columns into character.> > Thanks, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA
Like this? (use indexing to avoid explicit loops whenever possible):
## first convert factor columns to character, as David W. suggested
i <- sapply(od,is.factor)
od[i]<- lapply(od[i],as.character)
i <- sapply(tv, is.factor)
tv[i]<- lapply(tv[i],as.character)
## Now use ?match
wh <- match(od[,"AA1"], tv[,"AA"])
matched <- !is.na(wh) ## only needed if not all AA1's match in AA
od[matched,c("AA1", "Value")] <-
tv[wh[matched],c("BB","Value")]
> od
Value1 Value2 AA1 Value
1 1 11 B 7
2 2 12 B 25
3 3 13 B 7
4 4 14 B 5
5 5 15 B 10
6 6 16 B 5
7 7 17 CC 21
8 8 18 CC 21
9 9 19 B 25
10 10 20 B 25
11 11 21 B 7
12 12 22 B 25
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Sat, May 20, 2017 at 11:53 AM, David Winsemius
<dwinsemius at comcast.net> wrote:>
>> On May 20, 2017, at 11:23 AM, Christofer Bogaso <bogaso.christofer
at gmail.com> wrote:
>>
>> Hi again,
>>
>> Let say I have below 2 data frames.
>>
>> OriginalData = data.frame('Value1' = 1:12, 'Value2' =
11:22, 'AA1' >> c('AA4', 'AA3', 'AA4',
'AA1', 'AA2', 'AA1', 'AA6', 'AA6',
'AA3',
>> 'AA3', 'AA4', 'AA3'), 'Value' = NA)
>>
>> TargetValue = data.frame('AA' = c('AA1', 'AA2',
'AA3', 'AA4', 'AA5',
>> 'AA6'), 'BB' = c('B', 'B', 'B',
'B', 'CC', 'CC'), 'Value' = c(5, 10,
>> 25, 7, 35, 21))
>>
>> OriginalData
>> TargetValue
>>
>> Now I need to replace OriginalData's 'AA1' column with
TargetValue's
>> 'BB' column, based on matched values between 'AA1'
& 'AA' columns of
>> OriginalData & TargetValue respectively. With this same law, I need
to
>> update 'Value' column of OriginalData with that of
TargetValue.
>>
>> As an example, after replacement by above rule, 1st row of
>> OriginalData should look like :
>>
>>> OriginalData
>>
>> Value1 Value2 AA1 Value
>>
>> 1 1 11 B 7
>>
>> Values of TargetValue's 'AA' column are unique i.e. no
duplication
>>
>> Previously I have implemented a 'for' loop to implement above,
however
>> since both of my data.frames are quite big, it is taking long time to
>> execute. Is there any 'R' way to implement this quickly.
>>
>> Appreciate for any pointer.
>
> It's going to have a greater chance of delivering the desired result if
you convert the factor columns into character.
>
>>
>> Thanks,
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.