Dear R-help, Let's say `x1' and `x2' are very long vectors (length=5e5, say) with same set of names but in different order. If I want to sort `x2' in the order of `x1', I would do x2[names(x1)] but the amount of time that takes is quite prohibitive! Does anyone have any suggestion on a more efficient way to do this? If the two vectors are exactly the same length (as I said above), sorting both by names would probably be the fastest. However, if the two vectors differ in length (and the names for the shorter one are a subset of names of the longer one) then that doesn't work... Best, Andy
I suspect three causes for slowness:
1) possibly names are lot of overhead (string compares, lookup, etc).
2) maybe it's just memory, in which case you could loop over chunks of the
names(x1) vector
3) you're basically asking for the permutation taking names(x2) into
names(x1) and then applying it to x2. The first step is a sort but perhaps
the indexing code doesn't optimize that.
Reid Huntsinger
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
Sent: Friday, May 07, 2004 5:16 PM
To: r-help at stat.math.ethz.ch
Subject: [R] re-ordering a vector by name
Dear R-help,
Let's say `x1' and `x2' are very long vectors (length=5e5, say) with
same
set of names but in different order. If I want to sort `x2' in the order of
`x1', I would do
x2[names(x1)]
but the amount of time that takes is quite prohibitive! Does anyone have
any suggestion on a more efficient way to do this?
If the two vectors are exactly the same length (as I said above), sorting
both by names would probably be the fastest. However, if the two vectors
differ in length (and the names for the shorter one are a subset of names of
the longer one) then that doesn't work...
Best,
Andy
______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
----------------------------------------------------------------------------
--
Notice: This e-mail message, together with any attachments,...{{dropped}}
Liaw, Andy wrote:> Dear R-help, > > Let's say `x1' and `x2' are very long vectors (length=5e5, say) with same > set of names but in different order. If I want to sort `x2' in the order of > `x1', I would do > > x2[names(x1)] > > but the amount of time that takes is quite prohibitive! Does anyone have > any suggestion on a more efficient way to do this? > > If the two vectors are exactly the same length (as I said above), sorting > both by names would probably be the fastest. However, if the two vectors > differ in length (and the names for the shorter one are a subset of names of > the longer one) then that doesn't work... > > Best, > Andy > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlHi Andy, Using match seems to be *much* faster: R> x1 <- 1:10000; names(x1) <- 1:10000 R> x2 <- 1:10000; names(x2) <- 10000:1 R> system.time(x3 <- x1[names(x2)]) [1] 1.88 0.00 1.88 NA NA R> system.time(x4 <- x1[match(names(x1), names(x2))]) [1] 0.01 0.00 0.01 NA NA R> all.equal(x3, x4) [1] TRUE R> This should also work if x1 and x2 are of diffent lengths. --sundar
> From: Sundar Dorai-Raj > > Liaw, Andy wrote: > > > Dear R-help, > > > > Let's say `x1' and `x2' are very long vectors (length=5e5, > say) with same > > set of names but in different order. If I want to sort > `x2' in the order of > > `x1', I would do > > > > x2[names(x1)] > > > > but the amount of time that takes is quite prohibitive! > Does anyone have > > any suggestion on a more efficient way to do this? > > > > If the two vectors are exactly the same length (as I said > above), sorting > > both by names would probably be the fastest. However, if > the two vectors > > differ in length (and the names for the shorter one are a > subset of names of > > the longer one) then that doesn't work... > > > > Best, > > Andy > > Hi Andy, > > Using match seems to be *much* faster: > > R> x1 <- 1:10000; names(x1) <- 1:10000 > R> x2 <- 1:10000; names(x2) <- 10000:1 > R> system.time(x3 <- x1[names(x2)]) > [1] 1.88 0.00 1.88 NA NA > R> system.time(x4 <- x1[match(names(x1), names(x2))]) > [1] 0.01 0.00 0.01 NA NA > R> all.equal(x3, x4) > [1] TRUE > R> > > This should also work if x1 and x2 are of diffent lengths. > > --sundarSundar, Thanks very much for the tip! However, I think the arguments in match() is backward:> n = 1e4 > x1 = sample(n) > x2 = sample(n) > names(x1) = sample(n) > names(x2) = sample(n) > system.time(x3 <- x1[names(x2)])[1] 5.71 0.00 6.02 NA NA> system.time(x4 <- x1[match(names(x1),names(x2))])[1] 0.03 0.00 0.03 NA NA> all.equal(x3, x4)[1] "Names: 9997 string mismatches" "Mean relative difference: 0.669837"> names(x3[1:5])[1] "5391" "9927" "6499" "1863" "8287"> names(x4[1:5])[1] "2560" "9914" "6348" "1291" "5718"> system.time(x4 <- x1[match(names(x2),names(x1))])[1] 0.03 0.00 0.03 NA NA> names(x4[1:5])[1] "5391" "9927" "6499" "1863" "8287"> all.equal(x3, x4)[1] TRUE [Admittedly this is why I rarely use match(): I get mixed up easily.] Reid: It isn't a memory problem. For vectors of length 6e5, I killed the R process after more than 5 hours on an Opteron 248. The R process was taking up about 114MB of RAM, out of 8GB in the box. I'm rather surprised that such seemingly simple operation would take so long, especially when sorting such vectors is very fast. What am I missing? Best, Andy