Philip Rhoades
2003-Sep-04 14:21 UTC
[R] Allelic Differentiation, sampling, unique(), duplicated()
Hi people, I have made some progress trying to work out how to solve this problem but I have got a bit stuck - sorry if this turns out to be a simple exercise . . Allelic Differentiation (AD) in genetics measures the number of different alleles between (say) two populations eg: Organisms in Pop 1 have alleles: a, b, c, d, e Organisms in Pop 2 have alleles: b, b, c, d, e Different (unique) alleles (n) are: a [unique() does not do what I want here for comparing these two vectors and I can't get combinations of unique() and duplicated() to work either.] Total alleles = 10 Therefore AD = (2 * n) / 10 = 0.2 What I want to do is compare two populations of 200 organisms each but sampling for only 20 at a time. So there are 200!/((200-20)! * 20!) possible combinations of samples in each population. For all possible combinations of sample pop1 and sample pop2 I want to measure AD ie (200!/((200-20)! * 20!) * 200!/((200-20)! * 20!) ) calculations. As well as the unique allele problem, can someone suggest how I can do the sampling loops? Thanks, Phil. -- Philip Rhoades Pricom Pty Limited (ACN 003 252 275 ABN 91 003 252 275) GPO Box 3411 Sydney NSW 2001 Australia Mobile: +61:0411-185-652 Fax: +61:2:8923-5363 E-mail: pri at chu.com.au
Thomas Lumley
2003-Sep-04 15:09 UTC
[R] Allelic Differentiation, sampling, unique(), duplicated()
On Fri, 5 Sep 2003, Philip Rhoades wrote:> Hi people, > > I have made some progress trying to work out how to solve this problem > but I have got a bit stuck - sorry if this turns out to be a simple > exercise . . > > Allelic Differentiation (AD) in genetics measures the number of > different alleles between (say) two populations eg: > > Organisms in Pop 1 have alleles: a, b, c, d, e > > Organisms in Pop 2 have alleles: b, b, c, d, e > > Different (unique) alleles (n) are: a > > [unique() does not do what I want here for comparing these two vectors > and I can't get combinations of unique() and duplicated() to work > either.]YOu could do it with union(setdiff(one,two), setdiff(two,one)) and there's probably a direct way to do it with match(). We should probably have a setsymdiff() function to add to the others.> Total alleles = 10 > > Therefore AD = (2 * n) / 10 = 0.2 > > What I want to do is compare two populations of 200 organisms each but > sampling for only 20 at a time. > > So there are 200!/((200-20)! * 20!) possible combinations of samples in > each population. > > For all possible combinations of sample pop1 and sample pop2 I want to > measure AD ie (200!/((200-20)! * 20!) * 200!/((200-20)! * 20!) ) > calculations.This is far too many calculations R> choose(200,20) [1] 1.613588e+27> As well as the unique allele problem, can someone suggest how I can do > the sampling loops? >You can't. 10^27 is a very large number. I would suggest choosing pop1 and pop2 at random, a few thousand or hundred thousand times (depending on the accuracy you need). -thomas