Hi all, I?m working with a dataset with 9 columns and 2000 rows. Each row represents an individual and one of the columns represents the volume of that individual (measured in cubic meters). I?d like to select a sample from this dataset (without considering any probability of the rows) in which the sum of the volume of the individuals in that sample >= 100 cubic m. I?ll appreciate any suggestion Thanks CM
CM, maybe s <- which(data.frame$attribute >= 100) is a starting point!? regards,christian -----Urspr?ngliche Nachricht----- Von: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]Im Auftrag von christian_mora at vtr.net Gesendet: Donnerstag, 4. Dezember 2003 13:18 An: r-help at stat.math.ethz.ch Betreff: [R] Selecting subsamples Hi all, I?m working with a dataset with 9 columns and 2000 rows. Each row represents an individual and one of the columns represents the volume of that individual (measured in cubic meters). I?d like to select a sample from this dataset (without considering any probability of the rows) in which the sum of the volume of the individuals in that sample >= 100 cubic m. I?ll appreciate any suggestion Thanks CM ______________________________________________ R-help at stat.math.ethz.ch mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Hallo I assume you want equal size samples x<-runif(1000) this construction gives you "y" to be set if sum(y) > 5 while(sum(y<-sample(x,10))<5) y<-sample(x,10) Cheers On 4 Dec 2003 at 8:18, christian_mora at vtr.net wrote:> Hi all, > I?m working with a dataset with 9 columns and 2000 rows. Each row > represents an individual and one of the columns represents the volume > of that individual (measured in cubic meters). I?d like to select a > sample from this dataset (without considering any probability of the > rows) in which the sum of the volume of the individuals in that sample > >= 100 cubic m. I?ll appreciate any suggestion Thanks CM > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-helpPetr Pikal petr.pikal at precheza.cz
On 04-Dec-03 christian_mora at vtr.net wrote:> Hi all, > I?m working with a dataset with 9 columns and 2000 rows. Each row > represents an individual and one of the columns represents the volume > of that individual (measured in cubic meters). I?d like to select a > sample from this dataset (without considering any probability of the > rows) in which the sum of the volume of the individuals in that sample > >= 100 cubic m.let X be the dataset. For N=2000: ix<-sort(rnorm(N),index.return=TRUE)$ix M<-max(which(cumsum(volume[ix])<100))+1 ## Assumes volume > 0 X[ix[1:M],] If you can't assume volume > 0, then somthing like M<-min( which(sum(volume)-cumsum(volume[ix]) <= sum(volume) - 100) ) Ted. -------------------------------------------------------------------- E-Mail: (Ted Harding) <Ted.Harding at nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 167 1972 Date: 04-Dec-03 Time: 14:08:48 ------------------------------ XFMail ------------------------------
christian_mora at vtr.net wrote [that he has a data set with 9 variables (columns) measured on 2000 individuals (rows) and wants a sample] in which the sum of the volume of the individuals in that sample >= 100 cubic m. Let's suppose that this information is held in d, a data frame, and that the volume column is d$vol. If sum(d$vol) < 100, there is no sample which satisfies your condition. If sum(d$vol) >= 100, then d is such a sample as it stands. If you want the smallest number of rows, then indices <- order(d$vol, decreasing=TRUE) gives you the row indices sorted by decreasing volume; d$vol[indices] => the volumes in decreasing order cumsum(") => the cumulative sum sum(" < 100.0) => 1 less than then number of rows you want so indices <- order(d$vol, decreasing=TRUE) d[indices[1:(sum(cumsum(d$vol[indices]) < 100.0) + 1)]] should be the answer you want. This is O(n.lg n) where n is the number of rows; in your case n is 2000. If you don't need the smallest sample, but just any old haphazard answer, indices <- sample(nrow(d)) d[indices[1:(sum(cumsum(d$vol[indices]) < 100.0) + 1)]] should be useful.