thr3ads.net - R help - [R] Sampling problem [Nov 2010]

If this information is useful, please help other people find it:
Share via:

wangwallace

2010-Nov-15 20:59 UTC

[R] Sampling problem

Hey,

I am hoping someone can help me with a sampling question.

I have a data frame of 8 variables (the first column is the subjects' id):

    SubID    CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 
      1          6      5       6       2      6      2        2       4
      2          6      4       7       2      6      6        2       3
      3          5      5       5       5      5      5        4       5
      4          5      4       3       4      4      4        5       2
      5          5      6       7       5      6      4        4       1
      6          5      4       3       6      4      3        7       3
      7          3      6       6       3      6      5        2       1
      8          3      6       6       3      6      5        4       7 

the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, and
CSE4 in one group and the rest in another group. 
>sample(data[,2:4],2,replace=FALSE)     
   CSE1 CSE2 
1      6    5    
2      6    4   
3      5    5   
4      5    4    
5      5    6   
6      5    4    
7      3    6    
8      3    6    

Now I want to sample 1 column from another group of variables (i.e., WSE1,
WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample from
to only those columns that are not correspond to GROUP 1 variables I have
sampled. That is, I want to sample a column from WSE3, WSE4  Columns  
corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped. 

How can I do this? what if I want to repeat this whole process (drawing 2
random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another
random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any ideas?

Many thanks in advance!!

-- 
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3043804.html
Sent from the R help mailing list archive at Nabble.com.

Michael Bedward

2010-Nov-16 02:18 UTC

head link

[R] Sampling problem

Hello,

Is this what you want ?

sampleX <- function(X, nGrp1, nsamples)
# X is matrix or data.frame with cols for two groups of variables
# with grp1 in cols 2:5 and grp2 in cols 6:9
#
# nGrp1 <- number of variables to sample from group 1
#
# nsamples <- number of rows in output matrix

  if (nGrp1 >= 4) stop("can't sample all group 1 variables")

  out <- matrix(0, nsamples, nGrp1+1)
  for (i in 1:nsamples) {
  	# choose grp1 vars to sample
    grp1 <- sample(4, nGrp1)

    # choose complentary grp2 var to sample
    grp2 <- sample((1:4)[-grp1], 1)

    # sample 1 value from each var
    out[i, ] <- apply(X[,c(grp1+1, grp2+5)], 2, sample, 1)
  }

  out
}

Michael


On 16 November 2010 07:59, wangwallace <talenttree at gmail.com>
wrote:>
> Hey,
>
> I am hoping someone can help me with a sampling question.
>
> I have a data frame of 8 variables (the first column is the subjects'
id):
>
> ? ?SubID ? ?CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
> ? ? ?1 ? ? ? ? ?6 ? ? ?5 ? ? ? 6 ? ? ? 2 ? ? ?6 ? ? ?2 ? ? ? ?2 ? ? ? 4
> ? ? ?2 ? ? ? ? ?6 ? ? ?4 ? ? ? 7 ? ? ? 2 ? ? ?6 ? ? ?6 ? ? ? ?2 ? ? ? 3
> ? ? ?3 ? ? ? ? ?5 ? ? ?5 ? ? ? 5 ? ? ? 5 ? ? ?5 ? ? ?5 ? ? ? ?4 ? ? ? 5
> ? ? ?4 ? ? ? ? ?5 ? ? ?4 ? ? ? 3 ? ? ? 4 ? ? ?4 ? ? ?4 ? ? ? ?5 ? ? ? 2
> ? ? ?5 ? ? ? ? ?5 ? ? ?6 ? ? ? 7 ? ? ? 5 ? ? ?6 ? ? ?4 ? ? ? ?4 ? ? ? 1
> ? ? ?6 ? ? ? ? ?5 ? ? ?4 ? ? ? 3 ? ? ? 6 ? ? ?4 ? ? ?3 ? ? ? ?7 ? ? ? 3
> ? ? ?7 ? ? ? ? ?3 ? ? ?6 ? ? ? 6 ? ? ? 3 ? ? ?6 ? ? ?5 ? ? ? ?2 ? ? ? 1
> ? ? ?8 ? ? ? ? ?3 ? ? ?6 ? ? ? 6 ? ? ? 3 ? ? ?6 ? ? ?5 ? ? ? ?4 ? ? ? 7
>
> the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, and
> CSE4 in one group and the rest in another group.
>
>>sample(data[,2:4],2,replace=FALSE)
>
> ? CSE1 CSE2
> 1 ? ? ?6 ? ?5
> 2 ? ? ?6 ? ?4
> 3 ? ? ?5 ? ?5
> 4 ? ? ?5 ? ?4
> 5 ? ? ?5 ? ?6
> 6 ? ? ?5 ? ?4
> 7 ? ? ?3 ? ?6
> 8 ? ? ?3 ? ?6
>
> Now I want to sample 1 column from another group of variables (i.e., WSE1,
> WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample
from
> to only those columns that are not correspond to GROUP 1 variables I have
> sampled. That is, I want to sample a column from WSE3, WSE4 ?Columns
> corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped.
>
> How can I do this? what if I want to repeat this whole process (drawing 2
> random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another
> random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any ideas?
>
> Many thanks in advance!!
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3043804.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

wangwallace

2010-Nov-16 05:10 UTC

head link

[R] Sampling problem

Michael, I really appreciate your help.

but I got the following error message when I wan trying to run the function
written by you:

Error in out[i, ] <- apply(help[, c(grp1 + 1, grp2 + 5)], 2, sample, 1) : 
  number of items to replace is not a multiple of replacement length

I am not quite sure why would this happen.

As a novice of R, these functions are kinda complex for me. I am wondering
if it is doable without using loops like that.

Again, thank you so much!!!  
-- 
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3044249.html
Sent from the R help mailing list archive at Nabble.com.

Petr PIKAL

2010-Nov-16 16:09 UTC

head link

[R] Odp: Sampling problem

Hi

Here is one way (If I understood what you did ask).

test<-read.table("clipboard", header=T)> test  SubID CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
1     1    6    5    6    2    6    2    2    4
2     2    6    4    7    2    6    6    2    3
3     3    5    5    5    5    5    5    4    5
4     4    5    4    3    4    4    4    5    2
5     5    5    6    7    5    6    4    4    1
6     6    5    4    3    6    4    3    7    3
7     7    3    6    6    3    6    5    2    1
8     8    3    6    6    3    6    5    4    7

fff<-function(dat, col1=2, col2=1) {
# col1 are number of columns from fist set and col2 from the second set
sel1<-sample(1:4, col1)
sel2<-sample((1:4)[-sel1], col2)
dat[,c(sel1+1,sel2+5)]
# i presume that your data are same as you posted, if not you has to 
change above values
}

fff(test)
  CSE2 CSE1 WSE3
1    5    6    2
<snip>
8    6    3    4> fff(test)  CSE1 CSE2 WSE3
1    6    5    2
<snip>
8    3    6    4> fff(test)  CSE1 CSE3 WSE4
1    6    6    4
<snip>
8    3    6    7

If you want to do it 1000 times just use simple loop

result <- vector("list", 1000)
for (i in 1:1000) result[[i]] <- fff(test)

Regards
Petr

r-help-bounces at r-project.org napsal dne 15.11.2010 21:59:21:
> wangwallace <talenttree at gmail.com> 
> Odeslal: r-help-bounces at r-project.org
> 
> 15.11.2010 21:59
> 
> Komu
> 
> r-help at r-project.org
> 
> Kopie
> 
> P?edm?t
> 
> [R] Sampling problem
> 
> 
> Hey,
> 
> I am hoping someone can help me with a sampling question.
> 
> I have a data frame of 8 variables (the first column is the subjects' 
id):> 
>     SubID    CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4 
>       1          6      5       6       2      6      2        2       4
>       2          6      4       7       2      6      6        2       3
>       3          5      5       5       5      5      5        4       5
>       4          5      4       3       4      4      4        5       2
>       5          5      6       7       5      6      4        4       1
>       6          5      4       3       6      4      3        7       3
>       7          3      6       6       3      6      5        2       1
>       8          3      6       6       3      6      5        4       7 
> 
> the 6 variables are categorized into two groups with CSE1, CSE2, CSE3, 
and> CSE4 in one group and the rest in another group. 
> 
> >sample(data[,2:4],2,replace=FALSE)
> 
>    CSE1 CSE2 
> 1      6    5 
> 2      6    4 
> 3      5    5 
> 4      5    4 
> 5      5    6 
> 6      5    4 
> 7      3    6 
> 8      3    6 
> 
> Now I want to sample 1 column from another group of variables (i.e., 
WSE1,> WSE2, WSE3, WSE4), but I want to restrict a vector I am going to sample 
from> to only those columns that are not correspond to GROUP 1 variables I 
have> sampled. That is, I want to sample a column from WSE3, WSE4  Columns 
> corresponding to CSE1 and CSE2 (i.e., WSE1, WSE2) need to be dropped. 
> 
> How can I do this? what if I want to repeat this whole process (drawing 
2> random columns from CSE1, CSE2, CSE3, and CSE4 first, AND then another
> random column from WSE1, WSE2, WSE3, and WSE4) for 1000 times. any 
ideas?> 
> Many thanks in advance!!
> 
> -- 
> View this message in context: 
http://r.789695.n4.nabble.com/Sampling-problem-> tp3043804p3043804.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.

wangwallace

2010-Nov-16 17:53 UTC

head link

[R] Odp: Sampling problem

Fabulicious!!!!!!!!!!!!!!!!! It worked!!! 

One more question, in the following data frame as posted above:

    SubID    CSE1 CSE2 CSE3 CSE4 WSE1 WSE2 WSE3 WSE4
      1          6      5       6       2      6      2        2       4
      2          6      4       7       2      6      6        2       3
      3          5      5       5       5      5      5        4       5
      4          5      4       3       4      4      4        5       2
      5          5      6       7       5      6      4        4       1
      6          5      4       3       6      4      3        7       3
      7          3      6       6       3      6      5        2       1
      8          3      6       6       3      6      5        4       7 

I want to draw the first random sample consisting of a row of integers under
the first group of variables (CSE1, CSE2, CSE3, CSE4). For example, assuming
the first draw yielded a sample of the first row (6, 5, 6, 2), now I want to
draw another random sample consisting of two rows of integers under the
second group of variables (WSE1, WSE2, WSE3, WSE4). Also, for the second
draw, I want to restrict a vector I am going to sample from to only those
rows that are not correspond to SubID I have sampled. That is, I want to
sample two rows of integers under the second group of variables (WSE1, WSE2,
WSE3, WSE4) from rows 2, 3, 4, 5, 6, 7, and 8.

Also, I want to repeat this whole process (drawing 1 random row of integers
under the first group of variables first, AND then another two random rows
under the second group of variables) for 1000 times. Any ideas? would that
be possible to do it by just revising the syntax you wrote above? many
thanks!!!
-- 
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3045352.html
Sent from the R help mailing list archive at Nabble.com.

wangwallace

2010-Nov-17 05:30 UTC

head link

[R] Odp: Sampling problem

I figured it out myself.

Again, Michael and Petr, many thanks to both of you!!! :) 
-- 
View this message in context:
http://r.789695.n4.nabble.com/Sampling-problem-tp3043804p3046161.html
Sent from the R help mailing list archive at Nabble.com.

Apparently Analagous Threads

Search for more maybe matching threads

R help - Nov 2010 - Sampling problem

[R] Sampling problem

[R] Sampling problem

[R] Sampling problem

[R] Odp: Sampling problem

[R] Odp: Sampling problem

[R] Odp: Sampling problem

Apparently Analagous Threads