thr3ads.net - R help - [R] re-ordering a vector by name [May 2004]

If this information is useful, please help other people find it:
Share via:

Liaw, Andy

2004-May-07 21:15 UTC

[R] re-ordering a vector by name

Dear R-help,

Let's say `x1' and `x2' are very long vectors (length=5e5, say) with
same
set of names but in different order.  If I want to sort `x2' in the order of
`x1', I would do 

  x2[names(x1)]

but the amount of time that takes is quite prohibitive!  Does anyone have
any suggestion on a more efficient way to do this?

If the two vectors are exactly the same length (as I said above), sorting
both by names would probably be the fastest.  However, if the two vectors
differ in length (and the names for the shorter one are a subset of names of
the longer one) then that doesn't work...

Best,
Andy

Huntsinger, Reid

2004-May-07 21:30 UTC

head link

[R] re-ordering a vector by name

I suspect three causes for slowness:

1) possibly names are lot of overhead (string compares, lookup, etc).
2) maybe it's just memory, in which case you could loop over chunks of the
names(x1) vector
3) you're basically asking for the permutation taking names(x2) into
names(x1) and then applying it to x2. The first step is a sort but perhaps
the indexing code doesn't optimize that.

Reid Huntsinger
-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Liaw, Andy
Sent: Friday, May 07, 2004 5:16 PM
To: r-help at stat.math.ethz.ch
Subject: [R] re-ordering a vector by name


Dear R-help,

Let's say `x1' and `x2' are very long vectors (length=5e5, say) with
same
set of names but in different order.  If I want to sort `x2' in the order of
`x1', I would do 

  x2[names(x1)]

but the amount of time that takes is quite prohibitive!  Does anyone have
any suggestion on a more efficient way to do this?

If the two vectors are exactly the same length (as I said above), sorting
both by names would probably be the fastest.  However, if the two vectors
differ in length (and the names for the shorter one are a subset of names of
the longer one) then that doesn't work...

Best,
Andy

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

----------------------------------------------------------------------------
--
Notice:  This e-mail message, together with any attachments,...{{dropped}}

Sundar Dorai-Raj

2004-May-07 21:55 UTC

head link

[R] re-ordering a vector by name

Liaw, Andy wrote:
> Dear R-help,
> 
> Let's say `x1' and `x2' are very long vectors (length=5e5, say)
with same
> set of names but in different order.  If I want to sort `x2' in the
order of
> `x1', I would do 
> 
>   x2[names(x1)]
> 
> but the amount of time that takes is quite prohibitive!  Does anyone have
> any suggestion on a more efficient way to do this?
> 
> If the two vectors are exactly the same length (as I said above), sorting
> both by names would probably be the fastest.  However, if the two vectors
> differ in length (and the names for the shorter one are a subset of names
of
> the longer one) then that doesn't work...
> 
> Best,
> Andy
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
Hi Andy,
   Using match seems to be *much* faster:

R> x1 <- 1:10000; names(x1) <- 1:10000
R> x2 <- 1:10000; names(x2) <- 10000:1
R> system.time(x3 <- x1[names(x2)])
[1] 1.88 0.00 1.88   NA   NA
R> system.time(x4 <- x1[match(names(x1), names(x2))])
[1] 0.01 0.00 0.01   NA   NA
R> all.equal(x3, x4)
[1] TRUE
R>

This should also work if x1 and x2 are of diffent lengths.

--sundar

Liaw, Andy

2004-May-08 02:44 UTC

head link

[R] re-ordering a vector by name

> From: Sundar Dorai-Raj 
> 
> Liaw, Andy wrote:
> 
> > Dear R-help,
> > 
> > Let's say `x1' and `x2' are very long vectors (length=5e5,
> say) with same
> > set of names but in different order.  If I want to sort 
> `x2' in the order of
> > `x1', I would do 
> > 
> >   x2[names(x1)]
> > 
> > but the amount of time that takes is quite prohibitive!  
> Does anyone have
> > any suggestion on a more efficient way to do this?
> > 
> > If the two vectors are exactly the same length (as I said 
> above), sorting
> > both by names would probably be the fastest.  However, if 
> the two vectors
> > differ in length (and the names for the shorter one are a 
> subset of names of
> > the longer one) then that doesn't work...
> > 
> > Best,
> > Andy
> 
> Hi Andy,
>    
> Using match seems to be *much* faster:
> 
> R> x1 <- 1:10000; names(x1) <- 1:10000
> R> x2 <- 1:10000; names(x2) <- 10000:1
> R> system.time(x3 <- x1[names(x2)])
> [1] 1.88 0.00 1.88   NA   NA
> R> system.time(x4 <- x1[match(names(x1), names(x2))])
> [1] 0.01 0.00 0.01   NA   NA
> R> all.equal(x3, x4)
> [1] TRUE
> R>
> 
> This should also work if x1 and x2 are of diffent lengths.
> 
> --sundar
Sundar,

Thanks very much for the tip!  However, I think the arguments in match() is
backward:
> n = 1e4
> x1 = sample(n)
> x2 = sample(n)
> names(x1) = sample(n)
> names(x2) = sample(n)
> system.time(x3 <- x1[names(x2)])
[1] 5.71 0.00 6.02   NA   NA> system.time(x4 <- x1[match(names(x1),names(x2))])
[1] 0.03 0.00 0.03   NA   NA> all.equal(x3, x4)[1] "Names: 9997 string mismatches"       "Mean relative 
difference:
0.669837"> names(x3[1:5]) [1] "5391" "9927" "6499" "1863"
"8287"> names(x4[1:5]) [1] "2560" "9914" "6348" "1291"
"5718"> system.time(x4 <- x1[match(names(x2),names(x1))])
[1] 0.03 0.00 0.03   NA   NA> names(x4[1:5]) [1] "5391" "9927" "6499" "1863"
"8287"> all.equal(x3, x4)[1] TRUE

[Admittedly this is why I rarely use match():  I get mixed up easily.]

Reid: It isn't a memory problem.  For vectors of length 6e5, I killed the R
process after more than 5 hours on an Opteron 248.  The R process was taking
up about 114MB of RAM, out of 8GB in the box.  I'm rather surprised that
such seemingly simple operation would take so long, especially when sorting
such vectors is very fast.  What am I missing?

Best,
Andy

Apparently Analagous Threads

Search for more maybe matching threads

R help - May 2004 - re-ordering a vector by name

[R] re-ordering a vector by name

[R] re-ordering a vector by name

[R] re-ordering a vector by name

[R] re-ordering a vector by name

Apparently Analagous Threads