thr3ads.net - R help - [R] Quickest way to match two vectors besides %in%? [Nov 2005]

If this information is useful, please help other people find it:
Share via:

Pete Cap

2005-Nov-08 19:28 UTC

[R] Quickest way to match two vectors besides %in%?

Hello list,

I have two data frames, X (48469,2) and Y (79771,5).

X[,1] contains distinct values of Y[,2].
I want to match values in X[,1] and Y[,2], then take
the corresponding value in [X,2] and place it in
Y[,4].

So far I have been doing it like so:
for(i in 1:48469) {
y[which(x[i,1]==y[,3]),4]<-x[i,2]
}

But it chunks along so very slowly that I can't help
but wonder if there's a faster way, mainly because on
my box it takes R about 30 seconds to simply COUNT to
48,469 in the for loop.

I have already tried using %in%.  It tells me if the
values in X[,1] are IN Y[,2], which is useful in
removing unnecessary values from X[,1].  But it does
not tell me exactly where they match.  which(X[,1]
%in% Y[,2]) does but it only matches on the first
instance.

This is the slowest part of the script I'm working
on--if I could improve it I could shave off some
serious operating time.  Any pointers?

Regards,

Pete

Duncan Murdoch

2005-Nov-08 20:00 UTC

head link

[R] Quickest way to match two vectors besides %in%?

On 11/8/2005 2:28 PM, Pete Cap wrote:> Hello list,
> 
> I have two data frames, X (48469,2) and Y (79771,5).
> 
> X[,1] contains distinct values of Y[,2].
> I want to match values in X[,1] and Y[,2], then take
> the corresponding value in [X,2] and place it in
> Y[,4].
> 
> So far I have been doing it like so:
> for(i in 1:48469) {
> y[which(x[i,1]==y[,3]),4]<-x[i,2]
> }
> 
> But it chunks along so very slowly that I can't help
> but wonder if there's a faster way, mainly because on
> my box it takes R about 30 seconds to simply COUNT to
> 48,469 in the for loop.
> 
> I have already tried using %in%.  It tells me if the
> values in X[,1] are IN Y[,2], which is useful in
> removing unnecessary values from X[,1].  But it does
> not tell me exactly where they match.  which(X[,1]
> %in% Y[,2]) does but it only matches on the first
> instance.
> 
> This is the slowest part of the script I'm working
> on--if I could improve it I could shave off some
> serious operating time.  Any pointers?
Look at the merge() function to add the X and Y columns to a new 
dataframe, then process that to merge the X[,2] and Y[,4] values.

It will be something like

Z <- merge(X, Y, by.x=1, by.y=2, all.y=TRUE)

changes <- !is.na(Z[,2])
Z[changes,5] <- Z[changes,2]

but you are almost certainly better off (from a maintenance point of 
view) to use the names of the columns, rather than guessing at column 
numbers.

Duncan Murdoch

Weiwei Shi

2005-Nov-08 20:15 UTC

head link

[R] Quickest way to match two vectors besides %in%?

?match
> x  X1 X2
1  1  5
2  2  6
3  3  7
4  4  8
> y  Y1 Y4
1  1  8
2  2  9
3  3 10
4  4 11
5  1 12
6  2 13
7  3 14
8  4 15
> y.orig<-y # backup
> y$Y4<-x$X2[match(y$Y1, x$X1)]
> y  Y1 Y4
1  1  5
2  2  6
3  3  7
4  4  8
5  1  5
6  2  6
7  3  7
8  4  8


HTH,

Weiwei

On 11/8/05, Pete Cap <peteoutside at yahoo.com>
wrote:> Hello list,
>
> I have two data frames, X (48469,2) and Y (79771,5).
>
> X[,1] contains distinct values of Y[,2].
> I want to match values in X[,1] and Y[,2], then take
> the corresponding value in [X,2] and place it in
> Y[,4].
>
> So far I have been doing it like so:
> for(i in 1:48469) {
> y[which(x[i,1]==y[,3]),4]<-x[i,2]
> }
>
> But it chunks along so very slowly that I can't help
> but wonder if there's a faster way, mainly because on
> my box it takes R about 30 seconds to simply COUNT to
> 48,469 in the for loop.
>
> I have already tried using %in%.  It tells me if the
> values in X[,1] are IN Y[,2], which is useful in
> removing unnecessary values from X[,1].  But it does
> not tell me exactly where they match.  which(X[,1]
> %in% Y[,2]) does but it only matches on the first
> instance.
>
> This is the slowest part of the script I'm working
> on--if I could improve it I could shave off some
> serious operating time.  Any pointers?
>
> Regards,
>
> Pete
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
>

--
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

paul sorenson

2005-Nov-08 20:47 UTC

head link

[R] Quickest way to match two vectors besides %in%?

Pete Cap wrote:> Hello list,
> 
> I have two data frames, X (48469,2) and Y (79771,5).
> 
> X[,1] contains distinct values of Y[,2].
> I want to match values in X[,1] and Y[,2], then take
> the corresponding value in [X,2] and place it in
> Y[,4].
> 
> So far I have been doing it like so:
> for(i in 1:48469) {
> y[which(x[i,1]==y[,3]),4]<-x[i,2]
> }
I'm not sure but isn't that a case where merge() can help?

cheers

Maybe Matching Threads

Search for more seemingly similar threads

R help - Nov 2005 - Quickest way to match two vectors besides %in%?

[R] Quickest way to match two vectors besides %in%?

[R] Quickest way to match two vectors besides %in%?

[R] Quickest way to match two vectors besides %in%?

[R] Quickest way to match two vectors besides %in%?

Maybe Matching Threads