thr3ads.net - R help - [R] sorting without order [Nov 2004]

If this information is useful, please help other people find it:
Share via:

Marc Mamin

2004-Nov-23 09:58 UTC

[R] sorting without order

Hello,


In order to increase the performance of a script I'd like to sort very large
vectors containing repeated integer values.
I'm not interesting in having the values sorted, but only grouped.
I also need the equivalent of index.return from the standard "sort"
function:

  f(c(10,1,10,100,1,10))

  =>

  grouped: c(10,10,10,1,1,100)
  ix:	  c(1,3,6,2,5,4)


is there a way to achieve this which would be faster than the standard sort
function?

Thanks for any hints,

Marc Mamin

Peter Dalgaard

2004-Nov-23 11:41 UTC

head link

[R] sorting without order

"Marc Mamin" <M.Mamin at intershop.de> writes:
> Hello,
> 
> 
> In order to increase the performance of a script I'd like to sort very
large vectors containing repeated integer values.
> I'm not interesting in having the values sorted, but only grouped.
> I also need the equivalent of index.return from the standard
"sort" function:
> 
>   f(c(10,1,10,100,1,10))
> 
>   =>
> 
>   grouped: c(10,10,10,1,1,100)
>   ix:	  c(1,3,6,2,5,4)
> 
> 
> is there a way to achieve this which would be faster than the standard sort
function?
> 
> Thanks for any hints,
Here's one way:

v <- c(10,1,10,100,1,10)
ix <- do.call("c",split(seq(along=v),v))
grouped <- v[ix]

Not sure about the speed though. Should be O(N) if the number of
groups is small, but the multiplier could be large because of various
formalities (such as adding names to ix).


-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Dimitris Rizopoulos

2004-Nov-23 12:52 UTC

head link

[R] sorting without order

Hi Marc,

continuing on Prof. Dalgaard's proposal, you could use:

ix <- unlist(split(seq(along=v), v), use.names=FALSE)

but even with this, `sort()' seems faster if you are interseted only 
in grouping:

v <- sample(1:25000, 50000, TRUE)
######
system.time(ix <- do.call("c",split(seq(along=v),v)), gcFirst=TRUE)
[1] 0.13 0.00 0.13   NA   NA

system.time(ix <- unlist(split(seq(along=v), v), use.names=FALSE), 
gcFirst=TRUE)
[1] 0.06 0.00 0.07   NA   NA

system.time(x <- sort(v), gcFirst=TRUE)
[1] 0.01 0.00 0.02   NA   NA


I hope it helps.

Best,
Dimitris

----
Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/336899
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat
     http://www.student.kuleuven.ac.be/~m0390867/dimitris.htm



----- Original Message ----- 
From: "Marc Mamin" <M.Mamin at intershop.de>
To: <r-help at stat.math.ethz.ch>
Sent: Tuesday, November 23, 2004 10:58 AM
Subject: [R] sorting without order

> Hello,
>
>
> In order to increase the performance of a script I'd like to sort 
> very large vectors containing repeated integer values.
> I'm not interesting in having the values sorted, but only grouped.
> I also need the equivalent of index.return from the standard
"sort"
> function:
>
>  f(c(10,1,10,100,1,10))
>
>  =>
>
>  grouped: c(10,10,10,1,1,100)
>  ix:   c(1,3,6,2,5,4)
>
>
> is there a way to achieve this which would be faster than the 
> standard sort function?
>
> Thanks for any hints,
>
> Marc Mamin
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
>

Apparently Analagous Threads

Search for more reasonably related threads

R help - Nov 2004 - sorting without order

[R] sorting without order

[R] sorting without order

[R] sorting without order

Apparently Analagous Threads