Frank S.
2015-Dec-14 20:01 UTC
[R] Random selection of a fixed number of values by interval
Dear R users, I'm writing to this list because I must get a random sample (without replacement) from a given vector, but the clue is that I need to extract a fixed number of values by each prespecified 1-unit interval. As an example I try to say, I have a data frame that looks like this (my real dataframe is bigger): data <- data.frame(id = 1:70, value= c(0.68, 2.96, 1.93, 5.63, 3.08, 3.10, 2.99, 1.79, 2.96, 0.85, 11.79, 0.06, 4.31, 0.64, 1.43, 0.88, 2.79, 4.67, 1.23, 1.43, 3.05, 2.44, 2.55, 3.82, 3.55, 1.56, 7.25, 2.75, 9.64, 5.14, 3.54, 3.12, 0.17, 1.07, 4.08, 4.47, 5.58, 7.41, 0.85, 4.30, 7.58, 0.58, 1.40, 4.74, 5.04, 0.14, 1.14, 3.28, 7.84, 0.07, 3.97, 1.02, 3.47, 0.66, 2.38, 0.06, 0.67, 0.48, 4.48, 0.12, 3.82, 2.27, 0.93, 0.30, 0.73, 0.33, 2.91, 0.81, 0.18, 0.42)) And I would like to select, in a random manner: 10 id's whose value belongs to [0,1) interval 7 id's whose value belongs to [1,2) 5 id's whose value belongs to [2,3) 5 id's whose value belongs to [3,4) 3 id's whose value belongs to [4,5) # I have the following values by each 1-unit interval: table(cut(data$value, include.lowest = T, right = FALSE, breaks = 0:ceiling(max(data$value)))) and the size vector: size <- c(10, 7, 5, 5, 3) But I'm not able to get it by using sample function. Does anyone have some idea? Thank you very much for any suggestions!! Frank S. [[alternative HTML version deleted]]
David L Carlson
2015-Dec-14 20:46 UTC
[R] Random selection of a fixed number of values by interval
There are lots of ways to do this. For example,> groups <- cut(data$value, include.lowest = T, right = FALSE,+ breaks = 0:ceiling(max(data$value)))> grp <- c("[0,1)", "[1,2)", "[2,3)", "[3,4)", "[4,5)") > size <- c(10, 7, 5, 5, 3) > set.seed(42) > samples <- lapply(1:5, function(x) sample(data$id[groups==grp[x]],+ size[x]))> names(samples) <- grp > samples$`[0,1)` [1] 69 68 33 63 56 46 65 12 50 58 $`[1,2)` [1] 20 34 43 8 15 52 19 $`[2,3)` [1] 7 22 62 28 2 $`[3,4)` [1] 61 53 5 25 21 $`[4,5)` [1] 59 35 40> > groups <- cut(data$value, include.lowest = T, right = FALSE,+ breaks = 0:ceiling(max(data$value)))> grp <- c("[0,1)", "[1,2)", "[2,3)", "[3,4)", "[4,5)") > size <- c(10, 7, 5, 5, 3) > set.seed(42) > samples <- lapply(1:5, function(x) sample(data$id[groups==grp[x]],+ size[x]))> names(samples) <- grp > samples$`[0,1)` [1] 69 68 33 63 56 46 65 12 50 58 $`[1,2)` [1] 20 34 43 8 15 52 19 $`[2,3)` [1] 7 22 62 28 2 $`[3,4)` [1] 61 53 5 25 21 $`[4,5)` [1] 59 35 40 ------------------------------------- David L Carlson Department of Anthropology Texas A&M University College Station, TX 77840-4352 -----Original Message----- From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Frank S. Sent: Monday, December 14, 2015 2:02 PM To: r-help at r-project.org Subject: [R] Random selection of a fixed number of values by interval Dear R users, I'm writing to this list because I must get a random sample (without replacement) from a given vector, but the clue is that I need to extract a fixed number of values by each prespecified 1-unit interval. As an example I try to say, I have a data frame that looks like this (my real dataframe is bigger): data <- data.frame(id = 1:70, value= c(0.68, 2.96, 1.93, 5.63, 3.08, 3.10, 2.99, 1.79, 2.96, 0.85, 11.79, 0.06, 4.31, 0.64, 1.43, 0.88, 2.79, 4.67, 1.23, 1.43, 3.05, 2.44, 2.55, 3.82, 3.55, 1.56, 7.25, 2.75, 9.64, 5.14, 3.54, 3.12, 0.17, 1.07, 4.08, 4.47, 5.58, 7.41, 0.85, 4.30, 7.58, 0.58, 1.40, 4.74, 5.04, 0.14, 1.14, 3.28, 7.84, 0.07, 3.97, 1.02, 3.47, 0.66, 2.38, 0.06, 0.67, 0.48, 4.48, 0.12, 3.82, 2.27, 0.93, 0.30, 0.73, 0.33, 2.91, 0.81, 0.18, 0.42)) And I would like to select, in a random manner: 10 id's whose value belongs to [0,1) interval 7 id's whose value belongs to [1,2) 5 id's whose value belongs to [2,3) 5 id's whose value belongs to [3,4) 3 id's whose value belongs to [4,5) # I have the following values by each 1-unit interval: table(cut(data$value, include.lowest = T, right = FALSE, breaks = 0:ceiling(max(data$value)))) and the size vector: size <- c(10, 7, 5, 5, 3) But I'm not able to get it by using sample function. Does anyone have some idea? Thank you very much for any suggestions!! Frank S. [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Bert Gunter
2015-Dec-14 21:22 UTC
[R] Random selection of a fixed number of values by interval
Yes. May I suggest: grp <- c("[0,1)", "[1,2)", "[2,3)", "[3,4)", "[4,5)") can be obtained more simply as grp <- levels(groups)[1:5] and one slight aesthetic change in the indexing: from: samples <- lapply(1:5, function(x) sample(data$id[groups==grp[x]], size[x])) to: samples <- lapply(1:5, function(x) sample(data[groups==grp[x],"id"], size[x])) (rows and columns in a data frame can be simultaneously indexed) Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Dec 14, 2015 at 12:46 PM, David L Carlson <dcarlson at tamu.edu> wrote:> There are lots of ways to do this. For example, > >> groups <- cut(data$value, include.lowest = T, right = FALSE, > + breaks = 0:ceiling(max(data$value))) >> grp <- c("[0,1)", "[1,2)", "[2,3)", "[3,4)", "[4,5)") >> size <- c(10, 7, 5, 5, 3) >> set.seed(42) >> samples <- lapply(1:5, function(x) sample(data$id[groups==grp[x]], > + size[x])) >> names(samples) <- grp >> samples > $`[0,1)` > [1] 69 68 33 63 56 46 65 12 50 58 > > $`[1,2)` > [1] 20 34 43 8 15 52 19 > > $`[2,3)` > [1] 7 22 62 28 2 > > $`[3,4)` > [1] 61 53 5 25 21 > > $`[4,5)` > [1] 59 35 40 > >> >> groups <- cut(data$value, include.lowest = T, right = FALSE, > + breaks = 0:ceiling(max(data$value))) >> grp <- c("[0,1)", "[1,2)", "[2,3)", "[3,4)", "[4,5)") >> size <- c(10, 7, 5, 5, 3) >> set.seed(42) >> samples <- lapply(1:5, function(x) sample(data$id[groups==grp[x]], > + size[x])) >> names(samples) <- grp >> samples > $`[0,1)` > [1] 69 68 33 63 56 46 65 12 50 58 > > $`[1,2)` > [1] 20 34 43 8 15 52 19 > > $`[2,3)` > [1] 7 22 62 28 2 > > $`[3,4)` > [1] 61 53 5 25 21 > > $`[4,5)` > [1] 59 35 40 > > > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Frank S. > Sent: Monday, December 14, 2015 2:02 PM > To: r-help at r-project.org > Subject: [R] Random selection of a fixed number of values by interval > > Dear R users, > > I'm writing to this list because I must get a random sample (without replacement) from a given vector, but the clue is that I need to extract a fixed number of values by each prespecified 1-unit interval. As an example I try to say, I have a data frame that looks like this (my real dataframe is bigger): > > data <- data.frame(id = 1:70, value= c(0.68, 2.96, 1.93, 5.63, 3.08, 3.10, 2.99, 1.79, 2.96, 0.85, 11.79, 0.06, 4.31, 0.64, 1.43, 0.88, 2.79, 4.67, > 1.23, 1.43, 3.05, 2.44, 2.55, 3.82, 3.55, 1.56, 7.25, 2.75, 9.64, 5.14, 3.54, 3.12, 0.17, 1.07, 4.08, 4.47, 5.58, 7.41, 0.85, 4.30, 7.58, > 0.58, 1.40, 4.74, 5.04, 0.14, 1.14, 3.28, 7.84, 0.07, 3.97, 1.02, 3.47, 0.66, 2.38, 0.06, 0.67, 0.48, 4.48, 0.12, 3.82, 2.27, 0.93, 0.30, > 0.73, 0.33, 2.91, 0.81, 0.18, 0.42)) > > And I would like to select, in a random manner: > > 10 id's whose value belongs to [0,1) interval > 7 id's whose value belongs to [1,2) > 5 id's whose value belongs to [2,3) > 5 id's whose value belongs to [3,4) > 3 id's whose value belongs to [4,5) > > # I have the following values by each 1-unit interval: > table(cut(data$value, include.lowest = T, right = FALSE, breaks = 0:ceiling(max(data$value)))) > > and the size vector: > size <- c(10, 7, 5, 5, 3) > > But I'm not able to get it by using sample function. Does anyone have some idea? > > Thank you very much for any suggestions!! > > Frank S. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
David Winsemius
2015-Dec-14 21:31 UTC
[R] Random selection of a fixed number of values by interval
> On Dec 14, 2015, at 12:46 PM, David L Carlson <dcarlson at tamu.edu> wrote: > > There are lots of ways to do this. For example,Another method with mapply: mapply(function( n, vals) {sample(vals$id, n)} , # no replacement is the default for sample vals= split(data, findInterval(data$value, 0:5) )[1:5] , # drops the values at 5 or above n=c(10,7,5,5,3) ) $`1` [1] 12 64 10 60 70 58 33 50 57 68 $`2` [1] 43 8 15 26 19 3 20 $`3` [1] 55 9 62 17 67 $`4` [1] 61 21 31 24 48 $`5` [1] 44 13 36> >> groups <- cut(data$value, include.lowest = T, right = FALSE, > + breaks = 0:ceiling(max(data$value))) >> grp <- c("[0,1)", "[1,2)", "[2,3)", "[3,4)", "[4,5)") >> size <- c(10, 7, 5, 5, 3) >> set.seed(42) >> samples <- lapply(1:5, function(x) sample(data$id[groups==grp[x]], > + size[x])) >> names(samples) <- grp >> samples > $`[0,1)` > [1] 69 68 33 63 56 46 65 12 50 58 > > $`[1,2)` > [1] 20 34 43 8 15 52 19 > > $`[2,3)` > [1] 7 22 62 28 2 > > $`[3,4)` > [1] 61 53 5 25 21 > > $`[4,5)` > [1] 59 35 40 > >> >> groups <- cut(data$value, include.lowest = T, right = FALSE, > + breaks = 0:ceiling(max(data$value))) >> grp <- c("[0,1)", "[1,2)", "[2,3)", "[3,4)", "[4,5)") >> size <- c(10, 7, 5, 5, 3) >> set.seed(42) >> samples <- lapply(1:5, function(x) sample(data$id[groups==grp[x]], > + size[x])) >> names(samples) <- grp >> samples > $`[0,1)` > [1] 69 68 33 63 56 46 65 12 50 58 > > $`[1,2)` > [1] 20 34 43 8 15 52 19 > > $`[2,3)` > [1] 7 22 62 28 2 > > $`[3,4)` > [1] 61 53 5 25 21 > > $`[4,5)` > [1] 59 35 40 > > > ------------------------------------- > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77840-4352 > > > > -----Original Message----- > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Frank S. > Sent: Monday, December 14, 2015 2:02 PM > To: r-help at r-project.org > Subject: [R] Random selection of a fixed number of values by interval > > Dear R users, > > I'm writing to this list because I must get a random sample (without replacement) from a given vector, but the clue is that I need to extract a fixed number of values by each prespecified 1-unit interval. As an example I try to say, I have a data frame that looks like this (my real dataframe is bigger): > > data <- data.frame(id = 1:70, value= c(0.68, 2.96, 1.93, 5.63, 3.08, 3.10, 2.99, 1.79, 2.96, 0.85, 11.79, 0.06, 4.31, 0.64, 1.43, 0.88, 2.79, 4.67, > 1.23, 1.43, 3.05, 2.44, 2.55, 3.82, 3.55, 1.56, 7.25, 2.75, 9.64, 5.14, 3.54, 3.12, 0.17, 1.07, 4.08, 4.47, 5.58, 7.41, 0.85, 4.30, 7.58, > 0.58, 1.40, 4.74, 5.04, 0.14, 1.14, 3.28, 7.84, 0.07, 3.97, 1.02, 3.47, 0.66, 2.38, 0.06, 0.67, 0.48, 4.48, 0.12, 3.82, 2.27, 0.93, 0.30, > 0.73, 0.33, 2.91, 0.81, 0.18, 0.42)) > > And I would like to select, in a random manner: > > 10 id's whose value belongs to [0,1) interval > 7 id's whose value belongs to [1,2) > 5 id's whose value belongs to [2,3) > 5 id's whose value belongs to [3,4) > 3 id's whose value belongs to [4,5) > > # I have the following values by each 1-unit interval: > table(cut(data$value, include.lowest = T, right = FALSE, breaks = 0:ceiling(max(data$value)))) > > and the size vector: > size <- c(10, 7, 5, 5, 3) > > But I'm not able to get it by using sample function. Does anyone have some idea? > > Thank you very much for any suggestions!! > > Frank S. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA