Hello All, Once again thanks for all of the help to date. I am climbing my R learning curve. I've got a few more questions that I hope I can get some guidance on though. I am not sure whether the etiquette is to break up multiple questions or not but I'll keep them together here for now as it may help put the questions in context despite the fact that the post may get a little long. Question 1: My first goal is to calculate the proportion of shared 1) behaviours and 2) alleles between numerous individuals. Pasted below ('propshared' function) is what I have now and and works very well for calculating the proportion of shared behaviours where the data is formatted with each column as a behaviour and each row an individual. Microsatellite genotypes are formatted differently. An example is below. Each row is an individual and each column is one allele from a single locus. From the below values L1 and L1.1 each give a copy of an allele for same locus. Occasionally values from different loci will have the same value altough these are not actually the same allele. I would like the calculation of the proportion of shared values for alleles to be restricted to the proportion of shared alleles within loci for all individuals (pairs of columns L1 and L1.1, L2 and L2.2....) What I have now calculates the proportion of shared values for alleles across loci. A specific example is that I would like the value *2* for individual *w *at * L1* to be considered the same as the value* 2* for individual *y* at *L1.1*but not the same as the value *2* for any other individual within any other pair of columns. genos<- data.frame( L1 = c(2,NA,1,3), L1 = c(1,NA,2,3), L2 = c(5,2,5,3), L2 = c(3,4,2,4), L3 = c(4,5,7,2), L3 = c(4,6,6,6) ) rownames(genos) = c("w","x","y","z")> genosL1 L1.1 L2 L2.1 L3 L3.1 w 2 1 5 3 4 4 x NA NA 2 4 5 6 y 1 2 5 2 7 6 z 3 3 3 4 2 6 propshared<-function(genos){ sapply( rownames(genos), function(ind1) sapply( rownames(genos), function(ind2) (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE ))) /length(genos[1,]))->x is.na(diag(x))<-TRUE x }> propshared(genos)w x y z w NA 0.0000000 0.1666667 0.1666667 x 0.0000000 NA 0.1666667 0.3333333 y 0.1666667 0.1666667 NA 0.3333333 z 0.1666667 0.3333333 0.3333333 NA The matrix I would like to have would look like this. w x y z w NA 0 0.333333333 0.166666667 x 0 NA 0.166666667 0.166666667 y 0.333333333 0.166666667 NA 0.166666667 z 0.166666667 0.166666667 0.166666667 NA Question 2: Thanks if you have made it this far..........Next I would like to calculate a randomized value of the mean proportion of shared alleles. To do this I thought I would randomize the original data (genos above say 1000 times ), recalculate the proportion of shared alleles at each step and then take the mean (my attempt below). When I do this I get the same mean proportion of shared alleles (or behaviours) as the original for every randomization. I assume that this is due to some property of permuting this type of data that I do not know. Does anyone have a recommendation as to how I might get a value of the proportion of shared alleles if alleles were distributed (again within loci) at random? randomize <- function(genos){ x <- apply(genos, 2, sample) rownames(x) <- rownames(genos) x } allele.permute<-function(genos, n){ list<-replicate(n,randomize(genos), simplify = FALSE) sapply(list, propshared, simplify = FALSE) } I hope this is clear. I appreciate all insights and input Thanks Grant [[alternative HTML version deleted]]
I am sorry for the incorrect subject. My subject autofilled without my noticing in time. I suppose a better subject would be Calculating proportion of shared occurances and randomizations. Grant 2008/4/19 Grant Gillis <grant.j.gillis@gmail.com>:> Hello All, > > Once again thanks for all of the help to date. I am climbing my R > learning curve. I've got a few more questions that I hope I can get some > guidance on though. I am not sure whether the etiquette is to break up > multiple questions or not but I'll keep them together here for now as it may > help put the questions in context despite the fact that the post may get a > little long. > > > Question 1: > > > My first goal is to calculate the proportion of shared 1) behaviours and > 2) alleles between numerous individuals. Pasted below ('propshared' > function) is what I have now and and works very well for calculating the > proportion of shared behaviours where the data is formatted with each column > as a behaviour and each row an individual. Microsatellite genotypes are > formatted differently. An example is below. Each row is an individual and > each column is one allele from a single locus. From the below values L1 > and L1.1 each give a copy of an allele for same locus. Occasionally values > from different loci will have the same value altough these are not actually > the same allele. > > I would like the calculation of the proportion of shared values for > alleles to be restricted to the proportion of shared alleles within loci for > all individuals (pairs of columns L1 and L1.1, L2 and L2.2....) What I have > now calculates the proportion of shared values for alleles across loci. A > specific example is that I would like the value *2* for individual *w *at > *L1* to be considered the same as the value* 2* for individual *y* at * > L1.1* but not the same as the value *2* for any other individual within > any other pair of columns. > > > genos<- data.frame( > > L1 = c(2,NA,1,3), > L1 = c(1,NA,2,3), > L2 = c(5,2,5,3), > L2 = c(3,4,2,4), > L3 = c(4,5,7,2), > L3 = c(4,6,6,6) ) > > rownames(genos) = c("w","x","y","z") > > > genos > L1 L1.1 L2 L2.1 L3 L3.1 > w 2 1 5 3 4 4 > x NA NA 2 4 5 6 > y 1 2 5 2 7 6 > z 3 3 3 4 2 6 > > > > propshared<-function(genos){ > > sapply( rownames(genos), function(ind1) > sapply( rownames(genos), function(ind2) > (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE ))) > /length(genos[1,]))->x > is.na(diag(x))<-TRUE > x > > } > > > propshared(genos) > w x y z > w NA 0.0000000 0.1666667 0.1666667 > x 0.0000000 NA 0.1666667 0.3333333 > y 0.1666667 0.1666667 NA 0.3333333 > z 0.1666667 0.3333333 0.3333333 NA > > > The matrix I would like to have would look like this. > w x y > z > w NA 0 0.333333333 0.166666667 > x 0 NA 0.166666667 > 0.166666667 > y 0.333333333 0.166666667 NA 0.166666667 > z 0.166666667 0.166666667 0.166666667 NA > > > Question 2: Thanks if you have made it this far..........Next I would > like to calculate a randomized value of the mean proportion of shared > alleles. To do this I thought I would randomize the original data (genos > above say 1000 times ), recalculate the proportion of shared alleles at each > step and then take the mean (my attempt below). When I do this I get the > same mean proportion of shared alleles (or behaviours) as the original for > every randomization. I assume that this is due to some property of > permuting this type of data that I do not know. Does anyone have a > recommendation as to how I might get a value of the proportion of shared > alleles if alleles were distributed (again within loci) at random? > > > randomize <- function(genos){ > x <- apply(genos, 2, sample) > rownames(x) <- rownames(genos) > x > } > > > allele.permute<-function(genos, n){ > > list<-replicate(n,randomize(genos), simplify = FALSE) > sapply(list, propshared, simplify = FALSE) > } > > > > > > > I hope this is clear. I appreciate all insights and input > Thanks > > Grant > > > >[[alternative HTML version deleted]]
Apparently Analagous Threads
- Deleting columns where the frequency of values are too disparate
- Fastest way to do HWE.exact test on 100K SNP data?
- miss.loc function in MCMC Geneland: can't make it work
- problem whit Geneland
- ideas to speed up code: converting a matrix of integers to a matrix of normally distributed values