Hello Gurus: If I have a dataframe with one of the variables called "age" for example, and I want to extract a random 10% of the observations from each "age" group of the entire data frame. Do I have to double loop to split the data and then loop again to assign random numbers? Or is there a better way to do this? Thanks! Karen _________________________________________________________________ [[alternative HTML version deleted]]
Here is one way of doing it:> x <- data.frame(group=sample(1:4,100,TRUE), age=runif(100,4,80)) > tapply(x$age, x$group, function(z) mean(z[sample(seq_along(z), length(z) / 10)]))1 2 3 4 34.56628 58.70901 54.26239 58.89306> >On Wed, Mar 5, 2008 at 7:49 PM, Chang Liu <changisme at hotmail.com> wrote:> > Hello Gurus: > > If I have a dataframe with one of the variables called "age" for example, and I want to extract a random 10% of the observations from each "age" group of the entire data frame. Do I have to double loop to split the data and then loop again to assign random numbers? Or is there a better way to do this? > > Thanks! > Karen > > > > _________________________________________________________________ > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Bill.Venables at csiro.au
2008-Mar-06 01:20 UTC
[R] extracting a percentage of data by random
You don't need any explicit loops at all. Here is a demo of one way to do it:> set.seed(23) # on Windows > dat <- data.frame(age = factor(sample(1:4, 200, rep = T)), y runif(200)) > head(dat) # ages are in random orderage y 1 3 0.64275524 2 1 0.56125314 3 2 0.82418228 4 3 0.97050933 5 4 0.02827508 6 2 0.72291636> with(dat, table(age)) # how many in each age groupage 1 2 3 4 37 55 44 64> ind <- lapply(split(1:nrow(dat), dat$age),function(x) sample(x, round(length(x)/10))) # the trick> ind$`1` [1] 135 2 188 133 $`2` [1] 124 33 140 162 25 13 $`3` [1] 115 79 27 44 $`4` [1] 58 129 84 198 72 109> sample_dat <- dat[sort(unlist(ind)), ] # with indices, select data > sample_datage y 2 1 0.5612531 13 2 0.7339141 25 2 0.9548750 27 3 0.7419931 33 2 0.6965722 44 3 0.5363812 58 4 0.5464051 72 4 0.2785669 79 3 0.6453164 84 4 0.1203811 109 4 0.9154706 115 3 0.2118767 124 2 0.3056171 129 4 0.7635097 133 1 0.6474702 135 1 0.2466226 140 2 0.6292326 162 2 0.5338671 188 1 0.9882631 198 4 0.1983350>Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:Bill.Venables at csiro.au http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Chang Liu Sent: Thursday, 6 March 2008 10:50 AM To: r-help at r-project.org Subject: [R] extracting a percentage of data by random Hello Gurus: If I have a dataframe with one of the variables called "age" for example, and I want to extract a random 10% of the observations from each "age" group of the entire data frame. Do I have to double loop to split the data and then loop again to assign random numbers? Or is there a better way to do this? Thanks! Karen _________________________________________________________________ [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Seemingly Similar Threads
- Muliple SQL statements / Return percentage of rows
- Beginners question about Percentage similarity in R?
- NADA package/cenboxplot() method: maximum censored percentage
- Can I calculcate the percentage of a gamma function area below a cutoff value?
- Percentage of the whole screen with spice client