Hey,
I am hoping someone can help me with a sampling question.
I have a data frame of 8 variables (the first column is the subjects' id):
SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
1 6 5 6 2 6 2 2 4
2 6 4 7 2 6 6 2 3
3 5 5 5 5 5 5 4 5
4 5 4 3 4 4 4 5 2
5 5 6 7 5 6 4 4 1
6 5 4 3 6 4 3 7 3
7 3 6 6 3 6 5 2 1
8 3 6 6 3 6 5 4 7
the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, and
CSE4 in one group and the rest in another group.
>sample(data[,2:4],2,replace=FALSE)
CSE1 CSE2
1 6 5
2 6 4
3 5 5
4 5 4
5 5 6
6 5 4
7 3 6
8 3 6
Now I want to sample 1 column from another group of variables (i.e., WSE1,
WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample from
to only those columns that are not correspond to GROUP 1 variables I have
sampled. That is, I want to sample a column from WSE3, WSE4 Columns
corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped.
How can I do this? what if I want to repeat this whole process (drawing 2
random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another
random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any ideas?
Many thanks in advance!!
--
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3043804.html
Sent from the R help mailing list archive at Nabble.com.
Hello,
Is this what you want ?
sampleX <- function(X, nGrp1, nsamples)
# X is matrix or data.frame with cols for two groups of variables
# with grp1 in cols 2:5 and grp2 in cols 6:9
#
# nGrp1 <- number of variables to sample from group 1
#
# nsamples <- number of rows in output matrix
if (nGrp1 >= 4) stop("can't sample all group 1 variables")
out <- matrix(0, nsamples, nGrp1+1)
for (i in 1:nsamples) {
# choose grp1 vars to sample
grp1 <- sample(4, nGrp1)
# choose complentary grp2 var to sample
grp2 <- sample((1:4)[-grp1], 1)
# sample 1 value from each var
out[i, ] <- apply(X[,c(grp1+1, grp2+5)], 2, sample, 1)
}
out
}
Michael
On 16 November 2010 07:59, wangwallace <talenttree at gmail.com>
wrote:>
> Hey,
>
> I am hoping someone can help me with a sampling question.
>
> I have a data frame of 8 variables (the first column is the subjects'
id):
>
> ? ?SubID ? ?CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
> ? ? ?1 ? ? ? ? ?6 ? ? ?5 ? ? ? 6 ? ? ? 2 ? ? ?6 ? ? ?2 ? ? ? ?2 ? ? ? 4
> ? ? ?2 ? ? ? ? ?6 ? ? ?4 ? ? ? 7 ? ? ? 2 ? ? ?6 ? ? ?6 ? ? ? ?2 ? ? ? 3
> ? ? ?3 ? ? ? ? ?5 ? ? ?5 ? ? ? 5 ? ? ? 5 ? ? ?5 ? ? ?5 ? ? ? ?4 ? ? ? 5
> ? ? ?4 ? ? ? ? ?5 ? ? ?4 ? ? ? 3 ? ? ? 4 ? ? ?4 ? ? ?4 ? ? ? ?5 ? ? ? 2
> ? ? ?5 ? ? ? ? ?5 ? ? ?6 ? ? ? 7 ? ? ? 5 ? ? ?6 ? ? ?4 ? ? ? ?4 ? ? ? 1
> ? ? ?6 ? ? ? ? ?5 ? ? ?4 ? ? ? 3 ? ? ? 6 ? ? ?4 ? ? ?3 ? ? ? ?7 ? ? ? 3
> ? ? ?7 ? ? ? ? ?3 ? ? ?6 ? ? ? 6 ? ? ? 3 ? ? ?6 ? ? ?5 ? ? ? ?2 ? ? ? 1
> ? ? ?8 ? ? ? ? ?3 ? ? ?6 ? ? ? 6 ? ? ? 3 ? ? ?6 ? ? ?5 ? ? ? ?4 ? ? ? 7
>
> the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, and
> CSE4 in one group and the rest in another group.
>
>>sample(data[,2:4],2,replace=FALSE)
>
> ? CSE1 CSE2
> 1 ? ? ?6 ? ?5
> 2 ? ? ?6 ? ?4
> 3 ? ? ?5 ? ?5
> 4 ? ? ?5 ? ?4
> 5 ? ? ?5 ? ?6
> 6 ? ? ?5 ? ?4
> 7 ? ? ?3 ? ?6
> 8 ? ? ?3 ? ?6
>
> Now I want to sample 1 column from another group of variables (i.e., WSE1,
> WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample
from
> to only those columns that are not correspond to GROUP 1 variables I have
> sampled. That is, I want to sample a column from WSE3, WSE4 ?Columns
> corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped.
>
> How can I do this? what if I want to repeat this whole process (drawing 2
> random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another
> random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any ideas?
>
> Many thanks in advance!!
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3043804.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Michael, I really appreciate your help. but I got the following error message when I wan trying to run the function written by you: Error in out[i, ] <- apply(help[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) : number of items to replace is not a multiple of replacement length I am not quite sure why would this happen. As a novice of R, these functions are kinda complex for me. I am wondering if it is doable without using loops like that. Again, thank you so much!!! -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3044249.html Sent from the R help mailing list archive at Nabble.com.
Hi
Here is one way (If I understood what you did ask).
test<-read.table("clipboard", header=T)> test
SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
1 1 6 5 6 2 6 2 2 4
2 2 6 4 7 2 6 6 2 3
3 3 5 5 5 5 5 5 4 5
4 4 5 4 3 4 4 4 5 2
5 5 5 6 7 5 6 4 4 1
6 6 5 4 3 6 4 3 7 3
7 7 3 6 6 3 6 5 2 1
8 8 3 6 6 3 6 5 4 7
fff<-function(dat, col1=2, col2=1) {
# col1 are number of columns from fist set and col2 from the second set
sel1<-sample(1:4, col1)
sel2<-sample((1:4)[-sel1], col2)
dat[,c(sel1+1,sel2+5)]
# i presume that your data are same as you posted, if not you has to
change above values
}
fff(test)
CSE2 CSE1 WSE3
1 5 6 2
<snip>
8 6 3 4> fff(test)
CSE1 CSE2 WSE3
1 6 5 2
<snip>
8 3 6 4> fff(test)
CSE1 CSE3 WSE4
1 6 6 4
<snip>
8 3 6 7
If you want to do it 1000 times just use simple loop
result <- vector("list", 1000)
for (i in 1:1000) result[[i]] <- fff(test)
Regards
Petr
r-help-bounces at r-project.org napsal dne 15.11.2010 21:59:21:
> wangwallace <talenttree at gmail.com>
> Odeslal: r-help-bounces at r-project.org
>
> 15.11.2010 21:59
>
> Komu
>
> r-help at r-project.org
>
> Kopie
>
> P?edm?t
>
> [R] Sampling problem
>
>
> Hey,
>
> I am hoping someone can help me with a sampling question.
>
> I have a data frame of 8 variables (the first column is the subjects'
id):>
> SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
> 1 6 5 6 2 6 2 2 4
> 2 6 4 7 2 6 6 2 3
> 3 5 5 5 5 5 5 4 5
> 4 5 4 3 4 4 4 5 2
> 5 5 6 7 5 6 4 4 1
> 6 5 4 3 6 4 3 7 3
> 7 3 6 6 3 6 5 2 1
> 8 3 6 6 3 6 5 4 7
>
> the 6 variables are categorized into two groups with CSE1, CSE2, CSE3,
and> CSE4 in one group and the rest in another group.
>
> >sample(data[,2:4],2,replace=FALSE)
>
> CSE1 CSE2
> 1 6 5
> 2 6 4
> 3 5 5
> 4 5 4
> 5 5 6
> 6 5 4
> 7 3 6
> 8 3 6
>
> Now I want to sample 1 column from another group of variables (i.e.,
WSE1,> WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample
from> to only those columns that are not correspond to GROUP 1 variables I
have> sampled. That is, I want to sample a column from WSE3, WSE4 Columns
> corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped.
>
> How can I do this? what if I want to repeat this whole process (drawing
2> random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another
> random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any
ideas?>
> Many thanks in advance!!
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-> tp3043804p3043804.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.
Fabulicious!!!!!!!!!!!!!!!!! It worked!!!
One more question, in the following data frame as posted above:
SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
1 6 5 6 2 6 2 2 4
2 6 4 7 2 6 6 2 3
3 5 5 5 5 5 5 4 5
4 5 4 3 4 4 4 5 2
5 5 6 7 5 6 4 4 1
6 5 4 3 6 4 3 7 3
7 3 6 6 3 6 5 2 1
8 3 6 6 3 6 5 4 7
I want to draw the first random sample consisting of a row of integers under
the first group of variables (CSE1, CSE2, CSE3, CSE4). For example, assuming
the first draw yielded a sample of the first row (6, 5, 6, 2), now I want to
draw another random sample consisting of two rows of integers under the
second group of variables (WSE1, WSE2, WSE3, WSE4). Also, for the second
draw, I want to restrict a vector I am going to sample from to only those
rows that are not correspond to SubID I have sampled. That is, I want to
sample two rows of integers under the second group of variables (WSE1, WSE2,
WSE3, WSE4) from rows 2, 3, 4, 5, 6, 7, and 8.
Also, I want to repeat this whole process (drawing 1 random row of integers
under the first group of variables first, AND then another two random rows
under the second group of variables) for 1000 times. Any ideas? would that
be possible to do it by just revising the syntax you wrote above? many
thanks!!!
--
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3045352.html
Sent from the R help mailing list archive at Nabble.com.
I figured it out myself. Again, Michael and Petr, many thanks to both of you!!! :) -- View this message in context: http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3046161.html Sent from the R help mailing list archive at Nabble.com.