Hello list, I have two data frames, X (48469,2) and Y (79771,5). X[,1] contains distinct values of Y[,2]. I want to match values in X[,1] and Y[,2], then take the corresponding value in [X,2] and place it in Y[,4]. So far I have been doing it like so: for(i in 1:48469) { y[which(x[i,1]==y[,3]),4]<-x[i,2] } But it chunks along so very slowly that I can't help but wonder if there's a faster way, mainly because on my box it takes R about 30 seconds to simply COUNT to 48,469 in the for loop. I have already tried using %in%. It tells me if the values in X[,1] are IN Y[,2], which is useful in removing unnecessary values from X[,1]. But it does not tell me exactly where they match. which(X[,1] %in% Y[,2]) does but it only matches on the first instance. This is the slowest part of the script I'm working on--if I could improve it I could shave off some serious operating time. Any pointers? Regards, Pete
On 11/8/2005 2:28 PM, Pete Cap wrote:> Hello list, > > I have two data frames, X (48469,2) and Y (79771,5). > > X[,1] contains distinct values of Y[,2]. > I want to match values in X[,1] and Y[,2], then take > the corresponding value in [X,2] and place it in > Y[,4]. > > So far I have been doing it like so: > for(i in 1:48469) { > y[which(x[i,1]==y[,3]),4]<-x[i,2] > } > > But it chunks along so very slowly that I can't help > but wonder if there's a faster way, mainly because on > my box it takes R about 30 seconds to simply COUNT to > 48,469 in the for loop. > > I have already tried using %in%. It tells me if the > values in X[,1] are IN Y[,2], which is useful in > removing unnecessary values from X[,1]. But it does > not tell me exactly where they match. which(X[,1] > %in% Y[,2]) does but it only matches on the first > instance. > > This is the slowest part of the script I'm working > on--if I could improve it I could shave off some > serious operating time. Any pointers?Look at the merge() function to add the X and Y columns to a new dataframe, then process that to merge the X[,2] and Y[,4] values. It will be something like Z <- merge(X, Y, by.x=1, by.y=2, all.y=TRUE) changes <- !is.na(Z[,2]) Z[changes,5] <- Z[changes,2] but you are almost certainly better off (from a maintenance point of view) to use the names of the columns, rather than guessing at column numbers. Duncan Murdoch
?match> xX1 X2 1 1 5 2 2 6 3 3 7 4 4 8> yY1 Y4 1 1 8 2 2 9 3 3 10 4 4 11 5 1 12 6 2 13 7 3 14 8 4 15> y.orig<-y # backup> y$Y4<-x$X2[match(y$Y1, x$X1)] > yY1 Y4 1 1 5 2 2 6 3 3 7 4 4 8 5 1 5 6 2 6 7 3 7 8 4 8 HTH, Weiwei On 11/8/05, Pete Cap <peteoutside at yahoo.com> wrote:> Hello list, > > I have two data frames, X (48469,2) and Y (79771,5). > > X[,1] contains distinct values of Y[,2]. > I want to match values in X[,1] and Y[,2], then take > the corresponding value in [X,2] and place it in > Y[,4]. > > So far I have been doing it like so: > for(i in 1:48469) { > y[which(x[i,1]==y[,3]),4]<-x[i,2] > } > > But it chunks along so very slowly that I can't help > but wonder if there's a faster way, mainly because on > my box it takes R about 30 seconds to simply COUNT to > 48,469 in the for loop. > > I have already tried using %in%. It tells me if the > values in X[,1] are IN Y[,2], which is useful in > removing unnecessary values from X[,1]. But it does > not tell me exactly where they match. which(X[,1] > %in% Y[,2]) does but it only matches on the first > instance. > > This is the slowest part of the script I'm working > on--if I could improve it I could shave off some > serious operating time. Any pointers? > > Regards, > > Pete > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III
Pete Cap wrote:> Hello list, > > I have two data frames, X (48469,2) and Y (79771,5). > > X[,1] contains distinct values of Y[,2]. > I want to match values in X[,1] and Y[,2], then take > the corresponding value in [X,2] and place it in > Y[,4]. > > So far I have been doing it like so: > for(i in 1:48469) { > y[which(x[i,1]==y[,3]),4]<-x[i,2] > }I'm not sure but isn't that a case where merge() can help? cheers