thr3ads.net - R help - [R] resampling from distributions [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Grant Gillis

2008-Apr-19 20:29 UTC

[R] resampling from distributions

Hello All,

Once again thanks for all of the help to date.  I am climbing my R learning
curve.  I've got a few more questions that I hope I can get some guidance on
though.   I am not sure whether the etiquette is to break up multiple
questions or not but I'll keep them together here for now as it may help put
the questions in context despite the fact that the post may get a little
long.


Question 1:


My first goal is to calculate the proportion of shared 1) behaviours and 2)
alleles between numerous individuals.  Pasted below ('propshared'
function)
is what I have now and and works very well for calculating the proportion of
shared behaviours where the data is formatted with each column as a
behaviour and each row an individual.  Microsatellite genotypes are
formatted differently.  An example is below.  Each row is an individual and
each column is one allele from a single locus.  From the below values L1
and L1.1 each give a copy of an allele for same locus.  Occasionally values
from different loci will have the same value altough these are not actually
the same allele.

I would like the calculation of the proportion of shared values for alleles
to be restricted to the proportion of shared alleles within loci for all
individuals (pairs of columns L1 and L1.1, L2 and L2.2....)  What I have now
calculates the proportion of shared values for alleles across loci.  A
specific example is that I would like the value *2* for individual *w *at *
L1* to be considered the same as the value* 2* for individual *y* at
*L1.1*but not the same as the value
*2* for any other individual within any other pair of columns.


genos<- data.frame(

    L1 = c(2,NA,1,3),
    L1 = c(1,NA,2,3),
    L2 = c(5,2,5,3),
    L2 = c(3,4,2,4),
    L3 = c(4,5,7,2),
    L3 = c(4,6,6,6) )

rownames(genos) = c("w","x","y","z")
> genos     L1   L1.1 L2  L2.1 L3   L3.1
w    2    1     5    3      4      4
x   NA   NA  2    4      5      6
y    1    2     5     2      7      6
z    3    3     3     4      2      6



propshared<-function(genos){

    sapply( rownames(genos), function(ind1)
    sapply( rownames(genos), function(ind2)
    (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE ))) /length(genos[1,]))->x
    is.na(diag(x))<-TRUE
    x

}
> propshared(genos)          w         x         y         z
w        NA 0.0000000 0.1666667 0.1666667
x 0.0000000        NA 0.1666667 0.3333333
y 0.1666667 0.1666667        NA 0.3333333
z 0.1666667 0.3333333 0.3333333        NA


The matrix I would like to have would look like this.
      w                   x                        y
   z
w    NA                 0                      0.333333333     0.166666667
x    0                    NA                   0.166666667      0.166666667
y    0.333333333    0.166666667    NA                    0.166666667
z    0.166666667    0.166666667    0.166666667      NA


Question 2:  Thanks if you have made it this far..........Next I would like
to calculate a randomized value of the mean proportion of shared alleles.
To do this I thought I would randomize the original data (genos above say
1000 times ), recalculate the proportion of shared alleles at each step and
then take the mean (my attempt below).   When I do this I get the same mean
proportion of shared alleles (or behaviours) as the original for every
randomization.  I assume that this is due to some property of permuting this
type of data that I do not know.  Does anyone have a recommendation as to
how I might get a value of the proportion of shared alleles if alleles were
distributed (again within loci) at random?


randomize <- function(genos){
    x <- apply(genos, 2, sample)
    rownames(x) <- rownames(genos)
    x
}


allele.permute<-function(genos, n){

    list<-replicate(n,randomize(genos), simplify = FALSE)
    sapply(list, propshared, simplify = FALSE)
}






I hope this is clear.  I appreciate all insights and input
Thanks

Grant

	[[alternative HTML version deleted]]

Grant Gillis

2008-Apr-19 20:37 UTC

head link

[R] resampling from distributions

I am sorry for the incorrect subject.  My subject autofilled without my
noticing in time.  I suppose a better subject would be Calculating
proportion of shared occurances and randomizations.

Grant

2008/4/19 Grant Gillis <grant.j.gillis@gmail.com>:
> Hello All,
>
> Once again thanks for all of the help to date.  I am climbing my R
> learning curve.  I've got a few more questions that I hope I can get
some
> guidance on though.   I am not sure whether the etiquette is to break up
> multiple questions or not but I'll keep them together here for now as
it may
> help put the questions in context despite the fact that the post may get a
> little long.
>
>
> Question 1:
>
>
> My first goal is to calculate the proportion of shared 1) behaviours and
> 2) alleles between numerous individuals.  Pasted below
('propshared'
> function) is what I have now and and works very well for calculating the
> proportion of shared behaviours where the data is formatted with each
column
> as a behaviour and each row an individual.  Microsatellite genotypes are
> formatted differently.  An example is below.  Each row is an individual and
> each column is one allele from a single locus.  From the below values L1
> and L1.1 each give a copy of an allele for same locus.  Occasionally values
> from different loci will have the same value altough these are not actually
> the same allele.
>
> I would like the calculation of the proportion of shared values for
> alleles to be restricted to the proportion of shared alleles within loci
for
> all individuals (pairs of columns L1 and L1.1, L2 and L2.2....)  What I
have
> now calculates the proportion of shared values for alleles across loci.  A
> specific example is that I would like the value *2* for individual *w *at
> *L1* to be considered the same as the value* 2* for individual *y* at *
> L1.1* but not the same as the value *2* for any other individual within
> any other pair of columns.
>
>
> genos<- data.frame(
>
>     L1 = c(2,NA,1,3),
>     L1 = c(1,NA,2,3),
>     L2 = c(5,2,5,3),
>     L2 = c(3,4,2,4),
>     L3 = c(4,5,7,2),
>     L3 = c(4,6,6,6) )
>
> rownames(genos) =
c("w","x","y","z")
>
> > genos
>      L1   L1.1 L2  L2.1 L3   L3.1
> w    2    1     5    3      4      4
> x   NA   NA  2    4      5      6
> y    1    2     5     2      7      6
> z    3    3     3     4      2      6
>
>
>
> propshared<-function(genos){
>
>     sapply( rownames(genos), function(ind1)
>     sapply( rownames(genos), function(ind2)
>     (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE )))
> /length(genos[1,]))->x
>     is.na(diag(x))<-TRUE
>     x
>
> }
>
> > propshared(genos)
>           w         x         y         z
> w        NA 0.0000000 0.1666667 0.1666667
> x 0.0000000        NA 0.1666667 0.3333333
> y 0.1666667 0.1666667        NA 0.3333333
> z 0.1666667 0.3333333 0.3333333        NA
>
>
> The matrix I would like to have would look like this.
>       w                   x                        y
>      z
> w    NA                 0                      0.333333333     0.166666667
> x    0                    NA                   0.166666667
> 0.166666667
> y    0.333333333    0.166666667    NA                    0.166666667
> z    0.166666667    0.166666667    0.166666667      NA
>
>
> Question 2:  Thanks if you have made it this far..........Next I would
> like to calculate a randomized value of the mean proportion of shared
> alleles.   To do this I thought I would randomize the original data (genos
> above say 1000 times ), recalculate the proportion of shared alleles at
each
> step and then take the mean (my attempt below).   When I do this I get the
> same mean proportion of shared alleles (or behaviours) as the original for
> every randomization.  I assume that this is due to some property of
> permuting this type of data that I do not know.  Does anyone have a
> recommendation as to how I might get a value of the proportion of shared
> alleles if alleles were distributed (again within loci) at random?
>
>
> randomize <- function(genos){
>     x <- apply(genos, 2, sample)
>     rownames(x) <- rownames(genos)
>     x
> }
>
>
> allele.permute<-function(genos, n){
>
>     list<-replicate(n,randomize(genos), simplify = FALSE)
>     sapply(list, propshared, simplify = FALSE)
> }
>
>
>
>
>
>
> I hope this is clear.  I appreciate all insights and input
> Thanks
>
> Grant
>
>
>
>
	[[alternative HTML version deleted]]

Apparently Analagous Threads

Search for more apparently analagous threads

R help - Apr 2008 - resampling from distributions

[R] resampling from distributions

[R] resampling from distributions

Apparently Analagous Threads