thr3ads.net - R help - [R] Question about sampling [Jun 2012]

If this information is useful, please help other people find it:
Share via:

Guido Leoni

2012-Jun-14 11:02 UTC

[R] Question about sampling

Dear list I wish to extract from a population genotypized for 10 SNP a
subsample of the same population of size n with similar allele frequencies.
Essentially i have a matrix of 200 rows (df) like this
Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1,
sample01,Case,1,1,1,-1
sample02,Control,1,1,1,1
sample06,Control,1,-1,1,0
sample10,Case,1,1,1,0
sample11,Control,1,1,1,1
sample24,Control,-1,-1,1,0
sample29,Control,1,-1,1,0
sample42,Case,-1,-1,1,0
sample64,Case,-1,1,1,0
....
I'm interested to mantain in my subsample the same frequencies of those
observed for the 1 value in each column
I approached the problem with sample() function

mysample<-df[sample(1:nrow(df),100,replace=F),]
Then I tested that  the frequencies of each allele in mysample are not
statistically different respect to the initial dataset by mean of prop.test
This seems to work but do you know if there is a package that can do the
same thing  allowing for example a more strict control?
Thank you very much
Guido

	[[alternative HTML version deleted]]

R. Michael Weylandt

2012-Jun-14 13:14 UTC

head link

[R] Question about sampling

sample() takes a prob = argument which lets you supply weights, which
need not sum to one so, if I understand you, you could just pass TRUEs
and FALSEs for those rows you want. If I'm wrong about that last bit,
I'm still pretty confident sample(prob = ) is the way to go.

Best,
Michael

On Thu, Jun 14, 2012 at 6:02 AM, Guido Leoni <guido.leoni at gmail.com>
wrote:> Dear list I wish to extract from a population genotypized for 10 SNP a
> subsample of the same population of size n with similar allele frequencies.
> Essentially i have a matrix of 200 rows (df) like this
> Name,Condition,rs1385699_X,rs6625163_X,rs962458_X,Rs4658627_1,
> sample01,Case,1,1,1,-1
> sample02,Control,1,1,1,1
> sample06,Control,1,-1,1,0
> sample10,Case,1,1,1,0
> sample11,Control,1,1,1,1
> sample24,Control,-1,-1,1,0
> sample29,Control,1,-1,1,0
> sample42,Case,-1,-1,1,0
> sample64,Case,-1,1,1,0
> ....
> I'm interested to mantain in my subsample the same frequencies of those
> observed for the 1 value in each column
> I approached the problem with sample() function
>
> mysample<-df[sample(1:nrow(df),100,replace=F),]
> Then I tested that ?the frequencies of each allele in mysample are not
> statistically different respect to the initial dataset by mean of prop.test
> This seems to work but do you know if there is a package that can do the
> same thing ?allowing for example a more strict control?
> Thank you very much
> Guido
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more reasonably related threads

R help - Jun 2012 - Question about sampling

[R] Question about sampling

[R] Question about sampling

Seemingly Similar Threads