Hi, I am looking for a way to randomly extract a specified number of rows from a data frame. I was planning on binding a column of random numbers to the data frame and then sorting the data frame using this bound column. But I can't figure out how to use this column to sort the entire data frame so that the content of the rows remains together. Does anyone know how I can do this? Hints for other ways to approach this problem would also be appreciated. Cheers Amy Amy Whitehead School of Biological Sciences University of Canterbury Private Bag 4800 Christchurch Ph 03 364 2987 ext 7033 Cellphone 021 2020525 Email alw76 at student.canterbury.ac.nz
On Mon, 2007-02-19 at 16:10 +1300, Amy Whitehead wrote:> Hi, > > I am looking for a way to randomly extract a specified number of rows from a > data frame. I was planning on binding a column of random numbers to the > data frame and then sorting the data frame using this bound column. But I > can't figure out how to use this column to sort the entire data frame so > that the content of the rows remains together. Does anyone know how I can > do this? Hints for other ways to approach this problem would also be > appreciated. > > Cheers > AmySee ?sample Using the 'iris' dataset in R: # Select 2 random rows> iris[sample(nrow(iris), 2), ]Sepal.Length Sepal.Width Petal.Length Petal.Width Species 96 5.7 3.0 4.2 1.2 versicolor 17 5.4 3.9 1.3 0.4 setosa # Select 5 random rows> iris[sample(nrow(iris), 5), ]Sepal.Length Sepal.Width Petal.Length Petal.Width Species 83 5.8 2.7 3.9 1.2 versicolor 12 4.8 3.4 1.6 0.2 setosa 63 6.0 2.2 4.0 1.0 versicolor 80 5.7 2.6 3.5 1.0 versicolor 49 5.3 3.7 1.5 0.2 setosa HTH, Marc Schwartz
On Mon, 19 Feb 2007, Amy Whitehead wrote:> Hi, > > I am looking for a way to randomly extract a specified number of rows from a > data frame. I was planning on binding a column of random numbers to the > data frame and then sorting the data frame using this bound column. But I > can't figure out how to use this column to sort the entire data frame so > that the content of the rows remains together. Does anyone know how I can > do this? Hints for other ways to approach this problem would also be > appreciated. > > Cheers > Amy >It is a bit easier than that. Here is one way:> df <- airquality > rNames <- row.names(df) > sampRows <- sample(rNames,10) > sampRows[1] "137" "56" "1" "135" "62" "43" "12" "128" "86" "54"> subset(df,rNames%in%sampRows)Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 12 16 256 9.7 69 5 12 43 NA 250 9.2 92 6 12 54 NA 91 4.6 76 6 23 56 NA 135 8.0 75 6 25 62 135 269 4.1 84 7 1 86 108 223 8.0 85 7 25 128 47 95 7.4 87 9 5 135 21 259 15.5 76 9 12 137 9 24 10.9 71 9 14 David Scott _________________________________________________________________ David Scott Department of Statistics, Tamaki Campus The University of Auckland, PB 92019 Auckland 1142, NEW ZEALAND Phone: +64 9 373 7599 ext 86830 Fax: +64 9 373 7000 Email: d.scott at auckland.ac.nz Graduate Officer, Department of Statistics
amy, here is a piece of code copied from my blog, which might answer part of your question. library(MASS); data(Boston); # DIVIDE DATA INTO TESTING AND TRAINING SETS set.seed(2005); test.rows <- sample(1:nrow(Boston), 100); test.set <- Boston[test.rows, ]; train.set <- Boston[-test.rows, ]; On 2/18/07, Amy Whitehead <alw76 at student.canterbury.ac.nz> wrote:> Hi, > > I am looking for a way to randomly extract a specified number of rows from a > data frame. I was planning on binding a column of random numbers to the > data frame and then sorting the data frame using this bound column. But I > can't figure out how to use this column to sort the entire data frame so > that the content of the rows remains together. Does anyone know how I can > do this? Hints for other ways to approach this problem would also be > appreciated. > > Cheers > Amy > > > Amy Whitehead > School of Biological Sciences > University of Canterbury > Private Bag 4800 > Christchurch > Ph 03 364 2987 ext 7033 > Cellphone 021 2020525 > Email alw76 at student.canterbury.ac.nz > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- WenSui Liu A lousy statistician who happens to know a little programming (http://spaces.msn.com/statcompute/blog)