Hey, I am hoping someone can help me with a sampling question. I have a data frame of 8 variables (the first column is the subjects' id): SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 1 6 5 6 2 6 2 2 4 2 6 4 7 2 6 6 2 3 3 5 5 5 5 5 5 4 5 4 5 4 3 4 4 4 5 2 5 5 6 7 5 6 4 4 1 6 5 4 3 6 4 3 7 3 7 3 6 6 3 6 5 2 1 8 3 6 6 3 6 5 4 7 the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, and CSE4 in one group and the rest in another group.>sample(data[,2:4],2,replace=FALSE)CSE1 CSE2 1 6 5 2 6 4 3 5 5 4 5 4 5 5 6 6 5 4 7 3 6 8 3 6 Now I want to sample 1 column from another group of variables (i.e., WSE1, WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample from to only those columns that are not correspond to GROUP 1 variables I have sampled. That is, I want to sample a column from WSE3, WSE4 Columns corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped. How can I do this? what if I want to repeat this whole process (drawing 2 random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any ideas? Many thanks in advance!! -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3043804.html Sent from the R help mailing list archive at Nabble.com.
Hello, Is this what you want ? sampleX <- function(X, nGrp1, nsamples) # X is matrix or data.frame with cols for two groups of variables # with grp1 in cols 2:5 and grp2 in cols 6:9 # # nGrp1 <- number of variables to sample from group 1 # # nsamples <- number of rows in output matrix if (nGrp1 >= 4) stop("can't sample all group 1 variables") out <- matrix(0, nsamples, nGrp1+1) for (i in 1:nsamples) { # choose grp1 vars to sample grp1 <- sample(4, nGrp1) # choose complentary grp2 var to sample grp2 <- sample((1:4)[-grp1], 1) # sample 1 value from each var out[i, ] <- apply(X[,c(grp1+1, grp2+5)], 2, sample, 1) } out } Michael On 16 November 2010 07:59, wangwallace <talenttree at gmail.com> wrote:> > Hey, > > I am hoping someone can help me with a sampling question. > > I have a data frame of 8 variables (the first column is the subjects' id): > > ? ?SubID ? ?CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 > ? ? ?1 ? ? ? ? ?6 ? ? ?5 ? ? ? 6 ? ? ? 2 ? ? ?6 ? ? ?2 ? ? ? ?2 ? ? ? 4 > ? ? ?2 ? ? ? ? ?6 ? ? ?4 ? ? ? 7 ? ? ? 2 ? ? ?6 ? ? ?6 ? ? ? ?2 ? ? ? 3 > ? ? ?3 ? ? ? ? ?5 ? ? ?5 ? ? ? 5 ? ? ? 5 ? ? ?5 ? ? ?5 ? ? ? ?4 ? ? ? 5 > ? ? ?4 ? ? ? ? ?5 ? ? ?4 ? ? ? 3 ? ? ? 4 ? ? ?4 ? ? ?4 ? ? ? ?5 ? ? ? 2 > ? ? ?5 ? ? ? ? ?5 ? ? ?6 ? ? ? 7 ? ? ? 5 ? ? ?6 ? ? ?4 ? ? ? ?4 ? ? ? 1 > ? ? ?6 ? ? ? ? ?5 ? ? ?4 ? ? ? 3 ? ? ? 6 ? ? ?4 ? ? ?3 ? ? ? ?7 ? ? ? 3 > ? ? ?7 ? ? ? ? ?3 ? ? ?6 ? ? ? 6 ? ? ? 3 ? ? ?6 ? ? ?5 ? ? ? ?2 ? ? ? 1 > ? ? ?8 ? ? ? ? ?3 ? ? ?6 ? ? ? 6 ? ? ? 3 ? ? ?6 ? ? ?5 ? ? ? ?4 ? ? ? 7 > > the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, and > CSE4 in one group and the rest in another group. > >>sample(data[,2:4],2,replace=FALSE) > > ? CSE1 CSE2 > 1 ? ? ?6 ? ?5 > 2 ? ? ?6 ? ?4 > 3 ? ? ?5 ? ?5 > 4 ? ? ?5 ? ?4 > 5 ? ? ?5 ? ?6 > 6 ? ? ?5 ? ?4 > 7 ? ? ?3 ? ?6 > 8 ? ? ?3 ? ?6 > > Now I want to sample 1 column from another group of variables (i.e., WSE1, > WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample from > to only those columns that are not correspond to GROUP 1 variables I have > sampled. That is, I want to sample a column from WSE3, WSE4 ?Columns > corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped. > > How can I do this? what if I want to repeat this whole process (drawing 2 > random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another > random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any ideas? > > Many thanks in advance!! > > -- > View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3043804.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Michael, I really appreciate your help. but I got the following error message when I wan trying to run the function written by you: Error in out[i, ] <- apply(help[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) : number of items to replace is not a multiple of replacement length I am not quite sure why would this happen. As a novice of R, these functions are kinda complex for me. I am wondering if it is doable without using loops like that. Again, thank you so much!!! -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3044249.html Sent from the R help mailing list archive at Nabble.com.
Hi Here is one way (If I understood what you did ask). test<-read.table("clipboard", header=T)> testSubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 1 1 6 5 6 2 6 2 2 4 2 2 6 4 7 2 6 6 2 3 3 3 5 5 5 5 5 5 4 5 4 4 5 4 3 4 4 4 5 2 5 5 5 6 7 5 6 4 4 1 6 6 5 4 3 6 4 3 7 3 7 7 3 6 6 3 6 5 2 1 8 8 3 6 6 3 6 5 4 7 fff<-function(dat, col1=2, col2=1) { # col1 are number of columns from fist set and col2 from the second set sel1<-sample(1:4, col1) sel2<-sample((1:4)[-sel1], col2) dat[,c(sel1+1,sel2+5)] # i presume that your data are same as you posted, if not you has to change above values } fff(test) CSE2 CSE1 WSE3 1 5 6 2 <snip> 8 6 3 4> fff(test)CSE1 CSE2 WSE3 1 6 5 2 <snip> 8 3 6 4> fff(test)CSE1 CSE3 WSE4 1 6 6 4 <snip> 8 3 6 7 If you want to do it 1000 times just use simple loop result <- vector("list", 1000) for (i in 1:1000) result[[i]] <- fff(test) Regards Petr r-help-bounces at r-project.org napsal dne 15.11.2010 21:59:21:> wangwallace <talenttree at gmail.com> > Odeslal: r-help-bounces at r-project.org > > 15.11.2010 21:59 > > Komu > > r-help at r-project.org > > Kopie > > P?edm?t > > [R] Sampling problem > > > Hey, > > I am hoping someone can help me with a sampling question. > > I have a data frame of 8 variables (the first column is the subjects'id):> > SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 > 1 6 5 6 2 6 2 2 4 > 2 6 4 7 2 6 6 2 3 > 3 5 5 5 5 5 5 4 5 > 4 5 4 3 4 4 4 5 2 > 5 5 6 7 5 6 4 4 1 > 6 5 4 3 6 4 3 7 3 > 7 3 6 6 3 6 5 2 1 > 8 3 6 6 3 6 5 4 7> > the 6 variables are categorized into two groups with CSE1, CSE2, CSE3,and> CSE4 in one group and the rest in another group. > > >sample(data[,2:4],2,replace=FALSE) > > CSE1 CSE2 > 1 6 5 > 2 6 4 > 3 5 5 > 4 5 4 > 5 5 6 > 6 5 4 > 7 3 6 > 8 3 6 > > Now I want to sample 1 column from another group of variables (i.e.,WSE1,> WSE2, WSE3, WSE4), but I want to restrict a vector I am going to samplefrom> to only those columns that are not correspond to GROUP 1 variables Ihave> sampled. That is, I want to sample a column from WSE3, WSE4 Columns > corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped. > > How can I do this? what if I want to repeat this whole process (drawing2> random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another > random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. anyideas?> > Many thanks in advance!! > > -- > View this message in context:http://r.789695.n4.nabble.com/Sampling-problem-> tp3043804p3043804.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Fabulicious!!!!!!!!!!!!!!!!! It worked!!! One more question, in the following data frame as posted above: SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 1 6 5 6 2 6 2 2 4 2 6 4 7 2 6 6 2 3 3 5 5 5 5 5 5 4 5 4 5 4 3 4 4 4 5 2 5 5 6 7 5 6 4 4 1 6 5 4 3 6 4 3 7 3 7 3 6 6 3 6 5 2 1 8 3 6 6 3 6 5 4 7 I want to draw the first random sample consisting of a row of integers under the first group of variables (CSE1, CSE2, CSE3, CSE4). For example, assuming the first draw yielded a sample of the first row (6, 5, 6, 2), now I want to draw another random sample consisting of two rows of integers under the second group of variables (WSE1, WSE2, WSE3, WSE4). Also, for the second draw, I want to restrict a vector I am going to sample from to only those rows that are not correspond to SubID I have sampled. That is, I want to sample two rows of integers under the second group of variables (WSE1, WSE2, WSE3, WSE4) from rows 2, 3, 4, 5, 6, 7, and 8. Also, I want to repeat this whole process (drawing 1 random row of integers under the first group of variables first, AND then another two random rows under the second group of variables) for 1000 times. Any ideas? would that be possible to do it by just revising the syntax you wrote above? many thanks!!! -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3045352.html Sent from the R help mailing list archive at Nabble.com.
I figured it out myself. Again, Michael and Petr, many thanks to both of you!!! :) -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3046161.html Sent from the R help mailing list archive at Nabble.com.