thr3ads.net - R help - [R] Complex sort problem [May 2012]

If this information is useful, please help other people find it:
Share via:

Axel Urbiz

2012-May-17 10:45 UTC

[R] Complex sort problem

Dear List,

Is there a way I can sort a sample based on a sort index constructed from
the data from which the sample is taken? Basically, I need to take
'many'
samples from the same source data and sort them. This can be very time
consuming for long vectors. Is there any way I can sort the data only once
initially, and use that sort order for the samples?

I believe that idea is what is implemented in tree-based classifiers, so
the data is sorted only once initially and that sort order is used for the
child nodes.


set.seed(12345)
x <- sample(0:100, 10)
x.order <- order(x)
x.sorted <- x[x.order]

sample.ind <- sample(1:length(x), 5, replace = TRUE)  #sample 1/2 size with
replacement
x.sample <- x[sample.ind]

x.sample.sorted <-   #??? (without sorting again)

Thanks for any help on this.

Regards,
Axel.

	[[alternative HTML version deleted]]

David Winsemius

2012-May-17 12:35 UTC

head link

[R] Complex sort problem

On May 17, 2012, at 6:45 AM, Axel Urbiz wrote:
> Dear List,
>
> Is there a way I can sort a sample based on a sort index constructed  
> from
> the data from which the sample is taken? Basically, I need to take  
> 'many'
> samples from the same source data and sort them. This can be very time
> consuming for long vectors. Is there any way I can sort the data  
> only once
> initially, and use that sort order for the samples?
>
> I believe that idea is what is implemented in tree-based  
> classifiers, so
> the data is sorted only once initially and that sort order is used  
> for the
> child nodes.
>
>
> set.seed(12345)
> x <- sample(0:100, 10)
> x.order <- order(x)
> x.sorted <- x[x.order]
>
> sample.ind <- sample(1:length(x), 5, replace = TRUE)  #sample 1/2  
> size with
> replacement
> x.sample <- x[sample.ind]
>
> x.sample.sorted <-   #??? (without sorting again)
>
> Thanks for any help on this.
>
If you had created a named vector or used a dataframe with rownames  
you would have a record of the original sort order.
> Regards,
> Axel.
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT

Petr Savicky

2012-May-17 17:43 UTC

head link

[R] Complex sort problem

On Thu, May 17, 2012 at 06:45:52AM -0400, Axel Urbiz
wrote:> Dear List,
> 
> Is there a way I can sort a sample based on a sort index constructed from
> the data from which the sample is taken? Basically, I need to take
'many'
> samples from the same source data and sort them. This can be very time
> consuming for long vectors. Is there any way I can sort the data only once
> initially, and use that sort order for the samples?
> 
> I believe that idea is what is implemented in tree-based classifiers, so
> the data is sorted only once initially and that sort order is used for the
> child nodes.
> 
> 
> set.seed(12345)
> x <- sample(0:100, 10)
> x.order <- order(x)
> x.sorted <- x[x.order]
> 
> sample.ind <- sample(1:length(x), 5, replace = TRUE)  #sample 1/2 size
with replacement
> x.sample <- x[sample.ind]
> 
> x.sample.sorted <-   #??? (without sorting again)
Hi.

Formally, it is possible to avoid sorting using tabulate() and rep().
However, i am not sure, whether this approach is more efficient.

  set.seed(12345)
  x <- sample(0:100, 10)
  x.order <- order(x)
  x.sorted <- x[x.order]

  sample.ind <- sample(1:length(x), 5, replace = TRUE)  #sample 1/2 size with
replacement

  x.sample <- x.sorted[sample.ind]
  freq <- tabulate(sample.ind, nbins=length(x))
  x.sample.sorted <- rep(x.sorted, times=freq)

  identical(sort(x.sample), x.sample.sorted) # [1] TRUE

Note that x.sample is created from x.sorted in order to make x.sample
and x.sample.sorted consistent. Since sample.ind has random order, the
distributions of x[sample.ind] and x.sorted[sample.ind] are the same.

Computing the frequencies of indices, whose range is known in advance,
can be done in linear time, so theoretically more efficiently than
sorting. However, only a test may determine, what is more efficient in
your situation.

Hope this helps.

Petr Savicky.

Possibly Parallel Threads

Search for more reasonably related threads

R help - May 2012 - Complex sort problem

[R] Complex sort problem

[R] Complex sort problem

[R] Complex sort problem

Possibly Parallel Threads