gerard.tromp@sanger.med.wayne.edu
2003-Aug-04 04:44 UTC
[Rd] R package {genetics} function summary.genotype -- H and PIC return incorrect values. (PR#3642)
Greetings! I found a bug in the function summary.genotype (details below). Essentially it appears that the original code from David Duffy used af, allele frequency, as a proportion but in the genetics package af is allele frequency as a count. I tried to submit a bug report together with the fix to CRAN (below). I am unsure of the success since the bug report does not appear in the list. I am therefore sending it by e-mail. Sincerely, Gerard %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Gerard Tromp, Ph.D. vox: 313-577-8773 Center for Molecular Medicine and Genetics fax: 313-577-7736 Wayne State Univ. Schl. of Medicine 540 E. Canfield, Detroit, MI 48201 gtromp@cmb.biosci.wayne.edu mailto:tromp@sanger.med.wayne.edu http://cmmg.biosci.wayne.edu In function summary.genotype: Heterozygosity (H) and PIC are computed using af (allele frequency), however, af is derived as a frequency count whereas the computation of H and PIC are in terms of frequency as a proportion. Using the example data provided (genetics.R) H and PIC return large negative numbers whereas they are bounded 0,1. Output from buggy version: ************************************************************************* Marker: MBP2:C-104T (9q35:-104) Type: SNP Number persons typed: 100 (100%) Allele Frequency: (2 alleles) Count Proportion C 137 0.69 T 63 0.32 Genotype Frequency: Count Proportion C/C 59 0.59 C/T 19 0.19 T/T 22 0.22 Heterozygosity (Hu) = -22967 <------ Impossible Poly. Inf. Content = -1.5e+08 <------ Impossible *************************************************************************** Output from fixed version: *************************************************************************** Marker: MBP2:C-104T (9q35:-104) Type: SNP Number persons typed: 100 (100%) Allele Frequency: (2 alleles) Count Proportion C 137 0.69 T 63 0.32 Genotype Frequency: Count Proportion C/C 59 0.59 C/T 19 0.19 T/T 22 0.22 Heterozygosity (Hu) = 0.44 <--- Correct Poly. Inf. Content = 0.34 <--- Correct *************************************************************************** Fix is detailed below. ***************** 2539,2554 c 2539,2565 ******************************************** ### from code submitted by David Duffy <davidD@qimr.edu.au> # n.typed<-sum(gf) correction<-n.typed/max(1,n.typed-1) # Gerard Tromp 20030803 # Heterozygosity and PIC reported are incorrect. # Problem due to use of af which is a count rather than prop.table(af) a # proportion (fraction) less than 1. Definition of af changed? # ## Original # ehet<-(1-sum(af*af)) # the following produces desired result ehet<-(1-sum(prop.table(af)*prop.table(af))) ## Original # matings<- (af %*% t(af))^2 # the following produces desired result matings <- (prop.table(af) %*% t(prop.table(af)))^2 uninf.mating.freq <- sum(matings)-sum(diag(matings)) pic<- ehet - uninf.mating.freq retval$Hu <- correction * ehet retval$pic <- pic retval$n.typed <- n.typed retval$n.total <- length(object) retval$nallele <- nallele(object) # ### **************************************************************************** ******