Dear R-help, Let's say `x1' and `x2' are very long vectors (length=5e5, say) with same set of names but in different order. If I want to sort `x2' in the order of `x1', I would do x2[names(x1)] but the amount of time that takes is quite prohibitive! Does anyone have any suggestion on a more efficient way to do this? If the two vectors are exactly the same length (as I said above), sorting both by names would probably be the fastest. However, if the two vectors differ in length (and the names for the shorter one are a subset of names of the longer one) then that doesn't work... Best, Andy
I suspect three causes for slowness: 1) possibly names are lot of overhead (string compares, lookup, etc). 2) maybe it's just memory, in which case you could loop over chunks of the names(x1) vector 3) you're basically asking for the permutation taking names(x2) into names(x1) and then applying it to x2. The first step is a sort but perhaps the indexing code doesn't optimize that. Reid Huntsinger -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy Sent: Friday, May 07, 2004 5:16 PM To: r-help at stat.math.ethz.ch Subject: [R] re-ordering a vector by name Dear R-help, Let's say `x1' and `x2' are very long vectors (length=5e5, say) with same set of names but in different order. If I want to sort `x2' in the order of `x1', I would do x2[names(x1)] but the amount of time that takes is quite prohibitive! Does anyone have any suggestion on a more efficient way to do this? If the two vectors are exactly the same length (as I said above), sorting both by names would probably be the fastest. However, if the two vectors differ in length (and the names for the shorter one are a subset of names of the longer one) then that doesn't work... Best, Andy ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html ---------------------------------------------------------------------------- -- Notice: This e-mail message, together with any attachments,...{{dropped}}
Liaw, Andy wrote:> Dear R-help, > > Let's say `x1' and `x2' are very long vectors (length=5e5, say) with same > set of names but in different order. If I want to sort `x2' in the order of > `x1', I would do > > x2[names(x1)] > > but the amount of time that takes is quite prohibitive! Does anyone have > any suggestion on a more efficient way to do this? > > If the two vectors are exactly the same length (as I said above), sorting > both by names would probably be the fastest. However, if the two vectors > differ in length (and the names for the shorter one are a subset of names of > the longer one) then that doesn't work... > > Best, > Andy > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlHi Andy, Using match seems to be *much* faster: R> x1 <- 1:10000; names(x1) <- 1:10000 R> x2 <- 1:10000; names(x2) <- 10000:1 R> system.time(x3 <- x1[names(x2)]) [1] 1.88 0.00 1.88 NA NA R> system.time(x4 <- x1[match(names(x1), names(x2))]) [1] 0.01 0.00 0.01 NA NA R> all.equal(x3, x4) [1] TRUE R> This should also work if x1 and x2 are of diffent lengths. --sundar
> From: Sundar Dorai-Raj > > Liaw, Andy wrote: > > > Dear R-help, > > > > Let's say `x1' and `x2' are very long vectors (length=5e5, > say) with same > > set of names but in different order. If I want to sort > `x2' in the order of > > `x1', I would do > > > > x2[names(x1)] > > > > but the amount of time that takes is quite prohibitive! > Does anyone have > > any suggestion on a more efficient way to do this? > > > > If the two vectors are exactly the same length (as I said > above), sorting > > both by names would probably be the fastest. However, if > the two vectors > > differ in length (and the names for the shorter one are a > subset of names of > > the longer one) then that doesn't work... > > > > Best, > > Andy > > Hi Andy, > > Using match seems to be *much* faster: > > R> x1 <- 1:10000; names(x1) <- 1:10000 > R> x2 <- 1:10000; names(x2) <- 10000:1 > R> system.time(x3 <- x1[names(x2)]) > [1] 1.88 0.00 1.88 NA NA > R> system.time(x4 <- x1[match(names(x1), names(x2))]) > [1] 0.01 0.00 0.01 NA NA > R> all.equal(x3, x4) > [1] TRUE > R> > > This should also work if x1 and x2 are of diffent lengths. > > --sundarSundar, Thanks very much for the tip! However, I think the arguments in match() is backward:> n = 1e4 > x1 = sample(n) > x2 = sample(n) > names(x1) = sample(n) > names(x2) = sample(n) > system.time(x3 <- x1[names(x2)])[1] 5.71 0.00 6.02 NA NA> system.time(x4 <- x1[match(names(x1),names(x2))])[1] 0.03 0.00 0.03 NA NA> all.equal(x3, x4)[1] "Names: 9997 string mismatches" "Mean relative difference: 0.669837"> names(x3[1:5])[1] "5391" "9927" "6499" "1863" "8287"> names(x4[1:5])[1] "2560" "9914" "6348" "1291" "5718"> system.time(x4 <- x1[match(names(x2),names(x1))])[1] 0.03 0.00 0.03 NA NA> names(x4[1:5])[1] "5391" "9927" "6499" "1863" "8287"> all.equal(x3, x4)[1] TRUE [Admittedly this is why I rarely use match(): I get mixed up easily.] Reid: It isn't a memory problem. For vectors of length 6e5, I killed the R process after more than 5 hours on an Opteron 248. The R process was taking up about 114MB of RAM, out of 8GB in the box. I'm rather surprised that such seemingly simple operation would take so long, especially when sorting such vectors is very fast. What am I missing? Best, Andy