Marna Wagley
2016-Dec-07 11:58 UTC
[R] how to randomly select the samples with different probabilities for different classes?
Hi R user, I have samples with covariates for different classes, I wanted to choose the samples of different groups with different probabilities. For example, I have a 22 samples size with 3 classes, groupA has 8 samples groupB has 8 samples groupC has 6 samples I want to select a total 14 samples from 22 samples, in which 40% of the 14 samples should be in groups A and B, 60% of the 14 samples should be in the group C. Would you mind to help me on how I can select the samples with that conditions? I have attached a sample data dat<-structure(list(sampleID = c(17L, 21L, 36L, 45L, 67L, 82L, 90L, 31L, 70L, 45L, 24L, 80L, 82L, 45L, 85L, 14L, 81L, 96L, 61L, 12L, 65L, 88L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("sampleID", "group" ), class = "data.frame", row.names = c(NA, -22L)) thanks, MW [[alternative HTML version deleted]]
Rui Barradas
2016-Dec-07 13:23 UTC
[R] how to randomly select the samples with different probabilities for different classes?
Hello, If 60% of the 14 samples come from group C, then 8.4 samples should come from a group with 6 elements. Do you want sampling with replacement? If so maybe the following will do. perc <- c(0.4, 0.6) tmp <- split(seq_len(nrow(dat)), dat$group == "C") idx <- sapply(seq_along(tmp), function(i) sample(length(tmp[[i]]), round(perc[i]*14), replace = TRUE)) idx[[2]] <- idx[[2]] + 16 idx <- unlist(idx) dat[idx, ] Hope this helps, Rui Barradas Em 07-12-2016 11:58, Marna Wagley escreveu:> Hi R user, > I have samples with covariates for different classes, I wanted to choose > the samples of different groups with different probabilities. For example, > I have a 22 samples size with 3 classes, > groupA has 8 samples > groupB has 8 samples > groupC has 6 samples > > I want to select a total 14 samples from 22 samples, in which 40% of the > 14 samples should be in groups A and B, 60% of the 14 samples should be in > the group C. > > Would you mind to help me on how I can select the samples with that > conditions? I have attached a sample data > > dat<-structure(list(sampleID = c(17L, 21L, 36L, 45L, 67L, 82L, 90L, > 31L, 70L, 45L, 24L, 80L, 82L, 45L, 85L, 14L, 81L, 96L, 61L, 12L, > 65L, 88L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", > "B", "C"), class = "factor")), .Names = c("sampleID", "group" > ), class = "data.frame", row.names = c(NA, -22L)) > > thanks, > MW > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Jim Lemon
2016-Dec-07 21:11 UTC
[R] how to randomly select the samples with different probabilities for different classes?
Hi Marna, If we assume a sample size of 1, something like this: dat[sample(which(dat$group!="C"),ceiling(14*0.4),TRUE),] dat[sample(which(dat$group=="C"),floor(14*0.6),TRUE),] Then just step through the two subsets to access your samples. One problem is that you will not get exactly 40 or 60 %, which is why I had to put the "ceiling " and "floor" functions to work. Also, you will have to sample with replacement as you will exhaust the "C" group. Jim On Wed, Dec 7, 2016 at 10:58 PM, Marna Wagley <marna.wagley at gmail.com> wrote:> Hi R user, > I have samples with covariates for different classes, I wanted to choose > the samples of different groups with different probabilities. For example, > I have a 22 samples size with 3 classes, > groupA has 8 samples > groupB has 8 samples > groupC has 6 samples > > I want to select a total 14 samples from 22 samples, in which 40% of the > 14 samples should be in groups A and B, 60% of the 14 samples should be in > the group C. > > Would you mind to help me on how I can select the samples with that > conditions? I have attached a sample data > > dat<-structure(list(sampleID = c(17L, 21L, 36L, 45L, 67L, 82L, 90L, > 31L, 70L, 45L, 24L, 80L, 82L, 45L, 85L, 14L, 81L, 96L, 61L, 12L, > 65L, 88L), group = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, > 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("A", > "B", "C"), class = "factor")), .Names = c("sampleID", "group" > ), class = "data.frame", row.names = c(NA, -22L)) > > thanks, > MW > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.