Cesar HincapiƩ
2011-Mar-07 19:17 UTC
[R] generate 3 distinct random samples without replacement
Hello: I wonder if I could get a little help with random sampling in R. I have a vector of length 7375. I would like to draw 3 distinct random samples, each of length 100 without replacement. I have tried the following: d1 <- 1:7375 set.seed(7) i <- sample(d1, 100, replace=F) s1 <- sort(d1[i]) s1 d2 <- d1[-i] set.seed(77) j <- sample(d2, 100, replace=F) s2 <- sort(d2[j]) s2 d3 <- d2[-j] set.seed(777) k <- sample(d3, 100, replace=F) s3 <- sort(d3[k]) s3 D <- data.frame(a=s1,b=s2,c=s3) However, s2 is only 97 elements long, and s3, only 96 long. I would appreciate any suggestions on a better approach. I'm also curious to know why my second and third samples are less than 100 elements in length. Thanks for your time and consideration, Cesar A. HincapiƩ, DC, MHSc Research Fellow, Division of Health Care and Outcomes Research, Toronto Western Research Institute PhD Candidate in Epidemiology, Dalla Lana School of Public Health, University of Toronto e. cesar.hincapie@utoronto.ca [[alternative HTML version deleted]]
Jonathan P Daily
2011-Mar-07 20:18 UTC
[R] generate 3 distinct random samples without replacement
would this work? s <- sample(d1, 300, F) D <- data.frame(a = s[1:100], b = s[101:200], c = s[201:300]) -------------------------------------- Jonathan P. Daily Technician - USGS Leetown Science Center 11649 Leetown Road Kearneysville WV, 25430 (304) 724-4480 "Is the room still a room when its empty? Does the room, the thing itself have purpose? Or do we, what's the word... imbue it." - Jubal Early, Firefly r-help-bounces at r-project.org wrote on 03/07/2011 02:17:19 PM:> [image removed] > > [R] generate 3 distinct random samples without replacement > > Cesar Hincapi? > > to: > > r-help > > 03/07/2011 03:06 PM > > Sent by: > > r-help-bounces at r-project.org > > Hello: > > I wonder if I could get a little help with random sampling in R. > > I have a vector of length 7375. I would like to draw 3 distinct > random samples, each of length 100 without replacement. I have > tried the following: > > d1 <- 1:7375 > > set.seed(7) > i <- sample(d1, 100, replace=F) > s1 <- sort(d1[i]) > s1 > > d2 <- d1[-i] > set.seed(77) > j <- sample(d2, 100, replace=F) > s2 <- sort(d2[j]) > s2 > > d3 <- d2[-j] > set.seed(777) > k <- sample(d3, 100, replace=F) > s3 <- sort(d3[k]) > s3 > > D <- data.frame(a=s1,b=s2,c=s3) > > > However, s2 is only 97 elements long, and s3, only 96 long. > > I would appreciate any suggestions on a better approach. > I'm also curious to know why my second and third samples are less > than 100 elements in length. > > Thanks for your time and consideration, > > Cesar A. Hincapi?, DC, MHSc > > Research Fellow, Division of Health Care and Outcomes Research, > Toronto Western Research Institute > PhD Candidate in Epidemiology, Dalla Lana School of Public Health, > University of Toronto > e. cesar.hincapie at utoronto.ca > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Sarah Goslee
2011-Mar-07 20:18 UTC
[R] generate 3 distinct random samples without replacement
Cesar, your indexing is wrong: On Mon, Mar 7, 2011 at 2:17 PM, Cesar Hincapi? <cesar.hincapie at utoronto.ca> wrote:> Hello: > > I wonder if I could get a little help with random sampling in R. > > I have a vector of length 7375. ?I would like to draw 3 distinct random samples, each of length 100 without replacement. ?I have tried the following: > > d1 <- 1:7375 > > set.seed(7) > i <- sample(d1, 100, replace=F) > s1 <- sort(d1[i]) > s1d1 is a continuous vector of integers, 1 thru 7375 and of length 7375> d2 <- d1[-i]but you've taken out 100 of those numbers, so d2 is now of length 7275 and has gaps in the sequence.> set.seed(77) > j <- sample(d2, 100, replace=F) > s2 <- sort(d2[j]) > s2j is a sample *of the values* and those values are no longer the indices of the vector d2 You need instead j <- sample(1:length(d2), 100, replace=FALSE) s2 <- sort(d2[j]) Some of the value in j no longer exist in d2 as indices. 7375 could be selected, but since d2 only has 7275 elements d2[7375] doesn't return anything (actually NA). Same for your third sample, only the indices are even less like the elements of the vector because you've removed another random set of values. Sarah -- Sarah Goslee http://www.functionaldiversity.org
Duncan Murdoch
2011-Mar-07 20:52 UTC
[R] generate 3 distinct random samples without replacement
On 07/03/2011 2:17 PM, Cesar Hincapi? wrote:> Hello: > > I wonder if I could get a little help with random sampling in R. > > I have a vector of length 7375. I would like to draw 3 distinct random samples, each of length 100 without replacement. I have tried the following: > > d1<- 1:7375 > > set.seed(7) > i<- sample(d1, 100, replace=F) > s1<- sort(d1[i]) > s1 > > d2<- d1[-i] > set.seed(77) > j<- sample(d2, 100, replace=F) > s2<- sort(d2[j]) > s2 > > d3<- d2[-j] > set.seed(777) > k<- sample(d3, 100, replace=F) > s3<- sort(d3[k]) > s3 > > D<- data.frame(a=s1,b=s2,c=s3) > > > However, s2 is only 97 elements long, and s3, only 96 long. > > I would appreciate any suggestions on a better approach. > I'm also curious to know why my second and third samples are less than 100 elements in length.If you want 3 non-overlapping, non-repeating samples of 100, why not draw one sample of 300, and take 3 subsets of it? The reason you were finding shorter samples is because you were using j and k as indices into vectors d2 and d3 that didn't have enough elements, and then you sorted the result, losing the NAs. For example, d2 <- 1:10 d2[10:12] sort(d2[10:12]) See ?sort for an explanation of how to keep NA values when you sort. Duncan Murdoch> Thanks for your time and consideration, > > Cesar A. Hincapi?, DC, MHSc > > Research Fellow, Division of Health Care and Outcomes Research, Toronto Western Research Institute > PhD Candidate in Epidemiology, Dalla Lana School of Public Health, University of Toronto > e. cesar.hincapie at utoronto.ca > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.