thr3ads.net - R help - [R] Selecting subsamples [Dec 2003]

If this information is useful, please help other people find it:
Share via:

christian_mora@vtr.net

2003-Dec-04 12:18 UTC

[R] Selecting subsamples

Hi all,
I?m working with a dataset with 9 columns and 2000 rows. Each row represents
an individual and one of the columns represents the volume of that individual
(measured in cubic meters). I?d like to select a sample from this dataset
(without considering any probability of the rows) in which the sum of the
volume of the individuals in that sample >= 100 cubic m.
I?ll appreciate any suggestion
Thanks
CM

Christian Schulz

2003-Dec-04 12:24 UTC

head link

[R] Selecting subsamples

CM,

maybe
s <- which(data.frame$attribute >= 100)
is a starting point!?

regards,christian



-----Urspr?ngliche Nachricht-----
Von: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch]Im Auftrag von
christian_mora at vtr.net
Gesendet: Donnerstag, 4. Dezember 2003 13:18
An: r-help at stat.math.ethz.ch
Betreff: [R] Selecting subsamples


Hi all,
I?m working with a dataset with 9 columns and 2000 rows. Each row represents
an individual and one of the columns represents the volume of that
individual
(measured in cubic meters). I?d like to select a sample from this dataset
(without considering any probability of the rows) in which the sum of the
volume of the individuals in that sample >= 100 cubic m.
I?ll appreciate any suggestion
Thanks
CM

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help

Petr Pikal

2003-Dec-04 12:41 UTC

head link

[R] Selecting subsamples

Hallo

I assume you want equal size samples

x<-runif(1000)

this construction gives you "y" to be set if sum(y) > 5

while(sum(y<-sample(x,10))<5) y<-sample(x,10)

Cheers


On 4 Dec 2003 at 8:18, christian_mora at vtr.net wrote:
> Hi all,
> I?m working with a dataset with 9 columns and 2000 rows. Each row
> represents an individual and one of the columns represents the volume
> of that individual (measured in cubic meters). I?d like to select a
> sample from this dataset (without considering any probability of the
> rows) in which the sum of the volume of the individuals in that sample
> >= 100 cubic m. I?ll appreciate any suggestion Thanks CM
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Petr Pikal
petr.pikal at precheza.cz

(Ted Harding)

2003-Dec-04 14:08 UTC

head link

[R] Selecting subsamples

On 04-Dec-03 christian_mora at vtr.net wrote:> Hi all,
> I?m working with a dataset with 9 columns and 2000 rows. Each row
> represents an individual and one of the columns represents the volume
> of that individual (measured in cubic meters). I?d like to select a
> sample from this dataset (without considering any probability of the
> rows) in which the sum of the volume of the individuals in that sample
> >= 100 cubic m.
let X be the dataset. For N=2000:

  ix<-sort(rnorm(N),index.return=TRUE)$ix

  M<-max(which(cumsum(volume[ix])<100))+1 ## Assumes volume > 0

  X[ix[1:M],]

If you can't assume volume > 0, then somthing like

  M<-min( which(sum(volume)-cumsum(volume[ix]) <= sum(volume) - 100) )

Ted.


--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 167 1972
Date: 04-Dec-03                                       Time: 14:08:48
------------------------------ XFMail ------------------------------

Richard A. O'Keefe

2003-Dec-05 03:05 UTC

head link

[R] Selecting subsamples

christian_mora at vtr.net wrote
    [that he has a data set with 9 variables (columns) measured on 2000
     individuals (rows) and wants a sample] in which the sum of the
    volume of the individuals in that sample >= 100 cubic m.

Let's suppose that this information is held in d, a data frame, and that
the volume column is d$vol.

If sum(d$vol) < 100, there is no sample which satisfies your condition.
If sum(d$vol) >= 100, then d is such a sample as it stands.

If you want the smallest number of rows, then

    indices <- order(d$vol, decreasing=TRUE)

gives you the row indices sorted by decreasing volume;

    d$vol[indices]	=> the volumes in decreasing order
    cumsum(")           => the cumulative sum
    sum(" < 100.0)	=> 1 less than then number of rows you want

so

    indices <- order(d$vol, decreasing=TRUE)
    d[indices[1:(sum(cumsum(d$vol[indices]) < 100.0) + 1)]]

should be the answer you want.

This is O(n.lg n) where n is the number of rows; in your case n is 2000.

If you don't need the smallest sample, but just any old haphazard answer,

    indices <- sample(nrow(d))
    d[indices[1:(sum(cumsum(d$vol[indices]) < 100.0) + 1)]]

should be useful.

Possibly Parallel Threads

Search for more reasonably related threads

R help - Dec 2003 - Selecting subsamples

[R] Selecting subsamples

[R] Selecting subsamples

[R] Selecting subsamples

[R] Selecting subsamples

[R] Selecting subsamples

Possibly Parallel Threads