Hello All,
Once again thanks for all of the help to date. I am climbing my R learning
curve. I've got a few more questions that I hope I can get some guidance on
though. I am not sure whether the etiquette is to break up multiple
questions or not but I'll keep them together here for now as it may help put
the questions in context despite the fact that the post may get a little
long.
Question 1:
My first goal is to calculate the proportion of shared 1) behaviours and 2)
alleles between numerous individuals. Pasted below ('propshared'
function)
is what I have now and and works very well for calculating the proportion of
shared behaviours where the data is formatted with each column as a
behaviour and each row an individual. Microsatellite genotypes are
formatted differently. An example is below. Each row is an individual and
each column is one allele from a single locus. From the below values L1
and L1.1 each give a copy of an allele for same locus. Occasionally values
from different loci will have the same value altough these are not actually
the same allele.
I would like the calculation of the proportion of shared values for alleles
to be restricted to the proportion of shared alleles within loci for all
individuals (pairs of columns L1 and L1.1, L2 and L2.2....) What I have now
calculates the proportion of shared values for alleles across loci. A
specific example is that I would like the value *2* for individual *w *at *
L1* to be considered the same as the value* 2* for individual *y* at
*L1.1*but not the same as the value
*2* for any other individual within any other pair of columns.
genos<- data.frame(
L1 = c(2,NA,1,3),
L1 = c(1,NA,2,3),
L2 = c(5,2,5,3),
L2 = c(3,4,2,4),
L3 = c(4,5,7,2),
L3 = c(4,6,6,6) )
rownames(genos) = c("w","x","y","z")
> genos
L1 L1.1 L2 L2.1 L3 L3.1
w 2 1 5 3 4 4
x NA NA 2 4 5 6
y 1 2 5 2 7 6
z 3 3 3 4 2 6
propshared<-function(genos){
sapply( rownames(genos), function(ind1)
sapply( rownames(genos), function(ind2)
(sum( genos[ind1,] == genos[ind2,],na.rm=TRUE ))) /length(genos[1,]))->x
is.na(diag(x))<-TRUE
x
}
> propshared(genos)
w x y z
w NA 0.0000000 0.1666667 0.1666667
x 0.0000000 NA 0.1666667 0.3333333
y 0.1666667 0.1666667 NA 0.3333333
z 0.1666667 0.3333333 0.3333333 NA
The matrix I would like to have would look like this.
w x y
z
w NA 0 0.333333333 0.166666667
x 0 NA 0.166666667 0.166666667
y 0.333333333 0.166666667 NA 0.166666667
z 0.166666667 0.166666667 0.166666667 NA
Question 2: Thanks if you have made it this far..........Next I would like
to calculate a randomized value of the mean proportion of shared alleles.
To do this I thought I would randomize the original data (genos above say
1000 times ), recalculate the proportion of shared alleles at each step and
then take the mean (my attempt below). When I do this I get the same mean
proportion of shared alleles (or behaviours) as the original for every
randomization. I assume that this is due to some property of permuting this
type of data that I do not know. Does anyone have a recommendation as to
how I might get a value of the proportion of shared alleles if alleles were
distributed (again within loci) at random?
randomize <- function(genos){
x <- apply(genos, 2, sample)
rownames(x) <- rownames(genos)
x
}
allele.permute<-function(genos, n){
list<-replicate(n,randomize(genos), simplify = FALSE)
sapply(list, propshared, simplify = FALSE)
}
I hope this is clear. I appreciate all insights and input
Thanks
Grant
[[alternative HTML version deleted]]
I am sorry for the incorrect subject. My subject autofilled without my noticing in time. I suppose a better subject would be Calculating proportion of shared occurances and randomizations. Grant 2008/4/19 Grant Gillis <grant.j.gillis@gmail.com>:> Hello All, > > Once again thanks for all of the help to date. I am climbing my R > learning curve. I've got a few more questions that I hope I can get some > guidance on though. I am not sure whether the etiquette is to break up > multiple questions or not but I'll keep them together here for now as it may > help put the questions in context despite the fact that the post may get a > little long. > > > Question 1: > > > My first goal is to calculate the proportion of shared 1) behaviours and > 2) alleles between numerous individuals. Pasted below ('propshared' > function) is what I have now and and works very well for calculating the > proportion of shared behaviours where the data is formatted with each column > as a behaviour and each row an individual. Microsatellite genotypes are > formatted differently. An example is below. Each row is an individual and > each column is one allele from a single locus. From the below values L1 > and L1.1 each give a copy of an allele for same locus. Occasionally values > from different loci will have the same value altough these are not actually > the same allele. > > I would like the calculation of the proportion of shared values for > alleles to be restricted to the proportion of shared alleles within loci for > all individuals (pairs of columns L1 and L1.1, L2 and L2.2....) What I have > now calculates the proportion of shared values for alleles across loci. A > specific example is that I would like the value *2* for individual *w *at > *L1* to be considered the same as the value* 2* for individual *y* at * > L1.1* but not the same as the value *2* for any other individual within > any other pair of columns. > > > genos<- data.frame( > > L1 = c(2,NA,1,3), > L1 = c(1,NA,2,3), > L2 = c(5,2,5,3), > L2 = c(3,4,2,4), > L3 = c(4,5,7,2), > L3 = c(4,6,6,6) ) > > rownames(genos) = c("w","x","y","z") > > > genos > L1 L1.1 L2 L2.1 L3 L3.1 > w 2 1 5 3 4 4 > x NA NA 2 4 5 6 > y 1 2 5 2 7 6 > z 3 3 3 4 2 6 > > > > propshared<-function(genos){ > > sapply( rownames(genos), function(ind1) > sapply( rownames(genos), function(ind2) > (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE ))) > /length(genos[1,]))->x > is.na(diag(x))<-TRUE > x > > } > > > propshared(genos) > w x y z > w NA 0.0000000 0.1666667 0.1666667 > x 0.0000000 NA 0.1666667 0.3333333 > y 0.1666667 0.1666667 NA 0.3333333 > z 0.1666667 0.3333333 0.3333333 NA > > > The matrix I would like to have would look like this. > w x y > z > w NA 0 0.333333333 0.166666667 > x 0 NA 0.166666667 > 0.166666667 > y 0.333333333 0.166666667 NA 0.166666667 > z 0.166666667 0.166666667 0.166666667 NA > > > Question 2: Thanks if you have made it this far..........Next I would > like to calculate a randomized value of the mean proportion of shared > alleles. To do this I thought I would randomize the original data (genos > above say 1000 times ), recalculate the proportion of shared alleles at each > step and then take the mean (my attempt below). When I do this I get the > same mean proportion of shared alleles (or behaviours) as the original for > every randomization. I assume that this is due to some property of > permuting this type of data that I do not know. Does anyone have a > recommendation as to how I might get a value of the proportion of shared > alleles if alleles were distributed (again within loci) at random? > > > randomize <- function(genos){ > x <- apply(genos, 2, sample) > rownames(x) <- rownames(genos) > x > } > > > allele.permute<-function(genos, n){ > > list<-replicate(n,randomize(genos), simplify = FALSE) > sapply(list, propshared, simplify = FALSE) > } > > > > > > > I hope this is clear. I appreciate all insights and input > Thanks > > Grant > > > >[[alternative HTML version deleted]]
Maybe Matching Threads
- Deleting columns where the frequency of values are too disparate
- Fastest way to do HWE.exact test on 100K SNP data?
- miss.loc function in MCMC Geneland: can't make it work
- problem whit Geneland
- ideas to speed up code: converting a matrix of integers to a matrix of normally distributed values