Farrel Buchinsky
2006-May-02 21:37 UTC
[R] Repeating tdt function on thousands of variables
I am using dgc.genetics to perform TDT analysis on SNP data from a cohort of trios. I now have a file with about 6008 variables. The first few variables related to the pedigree data such as the pedigree ID the person ID etc. Thereafter each variable is a specific locus or marker. The variables are named by a pattern such as "Genotype.nnnnn" with nnnnn corresponding to a number which is the name or id of the locus. I am able to get the tdt to run by each locus. >tdt(Genotype.914186, PGWide, famid, pid, fatid, motid, sex, affected ) Clearly I cannot type each locus in one at a time. Instead I want to loop it but am not sure how to do it. I tried lapply but it did not really work. The example in Dalgaard's book, >sapply( thuesen, mean, na. rm= T) seems to work with basic functions but not with something like tdt. So how does one tell R to calculate the tdt for each variable and output the result to a dataframe in which one of the columns is the locus ID.? Since I have another table in which every locus ID is in one column and in another column I have its chromosome number and exact position on the chromosome I could always create a vector out of the locusIDs but still I would need to know how to pass it on to the tdt function in R. -- Farrel Buchinsky, MD Pediatric Otolaryngologist Allegheny General Hospital Pittsburgh, PA
Farrel Buchinsky wrote:> I am using dgc.genetics to perform TDT analysis on SNP data from a cohort of > trios. > > I now have a file with about 6008 variables. The first few variables related > to the pedigree data such as the pedigree ID the person ID etc. Thereafter > each variable is a specific locus or marker. The variables are named by a > pattern such as "Genotype.nnnnn" with nnnnn corresponding to a number which > is the name or id of the locus. > > I am able to get the tdt to run by each locus. >tdt(Genotype.914186, PGWide, > famid, pid, fatid, motid, sex, affected )Looks like you have to be much more specific: R> tdt(Genotype.914186, PGWide, famid, pid, fatid, motid, sex, affected) Error: could not find function "tdt" Uwe Ligges> Clearly I cannot type each locus in one at a time. Instead I want to loop it > but am not sure how to do it. I tried lapply but it did not really work. > > The example in Dalgaard's book, >sapply( thuesen, mean, na. rm= T) seems to > work with basic functions but not with something like tdt. So how does one > tell R to calculate the tdt for each variable and output the result to a > dataframe in which one of the columns is the locus ID.? > > Since I have another table in which every locus ID is in one column and in > another column I have its chromosome number and exact position on the > chromosome I could always create a vector out of the locusIDs but still I > would need to know how to pass it on to the tdt function in R. > >
Farrel Buchinsky
2006-May-03 11:24 UTC
[R] Repeating tdt function on thousands of variables
On 5/3/06, Uwe Ligges <ligges at statistik.uni-dortmund.de> wrote:> Looks like you have to be much more specific:tdt() is a function within dgc.genetics. dgc.genetics is a package written by David Clayton and available at http://www-gene.cimr.cam.ac.uk/clayton/software/ It consists of extensions to the genetics package. I could always drop the text from the output of help(tdt) here. Would that be acceptable ettiquete? On the one hand my question is highly specific but on the other it quite general..."How does one pass a whole batch of variable names to a function that is not one of R base functions such as "mean")?"
Farrel Buchinsky wrote:> On 5/3/06, Uwe Ligges <ligges at statistik.uni-dortmund.de> wrote: > >> Looks like you have to be much more specific: > > > tdt() is a function within dgc.genetics. > dgc.genetics is a package written by David Clayton and available at > http://www-gene.cimr.cam.ac.uk/clayton/software/ > It consists of extensions to the genetics package. > > I could always drop the text from the output of help(tdt) here. Would > that be acceptable ettiquete?>> On the one hand my question is highly specific but on the other it > quite general..."How does one pass a whole batch of variable names to > a function that is not one of R base functions such as "mean")?"The same way. lapply() and sapply() should work for almost all functions given, if nothing strange happens with environemnts, which is the case here: The problem is tdt() itself. Note that it has its argument data set to sys.frame(sys.parent()) as the default, but l/sapply are evaluating in a different environment! So it was really required to tell us where you got the function from. Uwe Ligges
> > I am using dgc.genetics to perform TDT analysis on SNP data from a cohort of > trios. > > I now have a file with about 6008 variables. The first few variables related > to the pedigree data such as the pedigree ID the person ID etc. Thereafter > each variable is a specific locus or marker. The variables are named by a > pattern such as "Genotype.nnnnn" with nnnnn corresponding to a number which > is the name or id of the locus. > > I am able to get the tdt to run by each locus. >tdt(Genotype.914186, PGWide, > famid, pid, fatid, motid, sex, affected ) > > Clearly I cannot type each locus in one at a time. Instead I want to loop it > but am not sure how to do it. I tried lapply but it did not really work. > -- > Farrel Buchinsky, MD > Pediatric Otolaryngologist > Allegheny General Hospital > Pittsburgh, PA >Something like: pos.first.marker <- 8 Nsnps <- nrow(your.data)-pos.first.marker+1 res <- double(Nsnps) names(res) <- names(your.data)[-seq(1,pos.first.marker-1)] for (i in seq(1, Nsnps)) { res[i] <- tdt(your.data[,i], your.data)$p.value[2] } David Duffy | David Duffy (MBBS PhD) ,-_|\ | email: davidD at qimr.edu.au ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v