Dear List, Is there a way I can sort a sample based on a sort index constructed from the data from which the sample is taken? Basically, I need to take 'many' samples from the same source data and sort them. This can be very time consuming for long vectors. Is there any way I can sort the data only once initially, and use that sort order for the samples? I believe that idea is what is implemented in tree-based classifiers, so the data is sorted only once initially and that sort order is used for the child nodes. set.seed(12345) x <- sample(0:100, 10) x.order <- order(x) x.sorted <- x[x.order] sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 size with replacement x.sample <- x[sample.ind] x.sample.sorted <- #??? (without sorting again) Thanks for any help on this. Regards, Axel. [[alternative HTML version deleted]]
On May 17, 2012, at 6:45 AM, Axel Urbiz wrote:> Dear List, > > Is there a way I can sort a sample based on a sort index constructed > from > the data from which the sample is taken? Basically, I need to take > 'many' > samples from the same source data and sort them. This can be very time > consuming for long vectors. Is there any way I can sort the data > only once > initially, and use that sort order for the samples? > > I believe that idea is what is implemented in tree-based > classifiers, so > the data is sorted only once initially and that sort order is used > for the > child nodes. > > > set.seed(12345) > x <- sample(0:100, 10) > x.order <- order(x) > x.sorted <- x[x.order] > > sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 > size with > replacement > x.sample <- x[sample.ind] > > x.sample.sorted <- #??? (without sorting again) > > Thanks for any help on this. >If you had created a named vector or used a dataframe with rownames you would have a record of the original sort order.> Regards, > Axel. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD West Hartford, CT
On Thu, May 17, 2012 at 06:45:52AM -0400, Axel Urbiz wrote:> Dear List, > > Is there a way I can sort a sample based on a sort index constructed from > the data from which the sample is taken? Basically, I need to take 'many' > samples from the same source data and sort them. This can be very time > consuming for long vectors. Is there any way I can sort the data only once > initially, and use that sort order for the samples? > > I believe that idea is what is implemented in tree-based classifiers, so > the data is sorted only once initially and that sort order is used for the > child nodes. > > > set.seed(12345) > x <- sample(0:100, 10) > x.order <- order(x) > x.sorted <- x[x.order] > > sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 size with replacement > x.sample <- x[sample.ind] > > x.sample.sorted <- #??? (without sorting again)Hi. Formally, it is possible to avoid sorting using tabulate() and rep(). However, i am not sure, whether this approach is more efficient. set.seed(12345) x <- sample(0:100, 10) x.order <- order(x) x.sorted <- x[x.order] sample.ind <- sample(1:length(x), 5, replace = TRUE) #sample 1/2 size with replacement x.sample <- x.sorted[sample.ind] freq <- tabulate(sample.ind, nbins=length(x)) x.sample.sorted <- rep(x.sorted, times=freq) identical(sort(x.sample), x.sample.sorted) # [1] TRUE Note that x.sample is created from x.sorted in order to make x.sample and x.sample.sorted consistent. Since sample.ind has random order, the distributions of x[sample.ind] and x.sorted[sample.ind] are the same. Computing the frequencies of indices, whose range is known in advance, can be done in linear time, so theoretically more efficiently than sorting. However, only a test may determine, what is more efficient in your situation. Hope this helps. Petr Savicky.