Ben Bimber
2010-Mar-18 14:41 UTC
[R] Pedigree / Identifying Immediate Family of Index Animal
I have a data frame containing the Id, Mother, Father and Sex from about 10,000 animals in our colony. I am interested in graphing simple family trees for a given subject or small number of subjects. The basic idea is: start with data frame from entire colony and list of index animals. I need to identify all immediate relatives of these index animals and plot the pedigree for them. We're not trying to do any sort of real analysis, just present a visualization of the family structure. I have used the kinship and pedigree packages to plot the pedigree. My question relates to efficiently identifying the animals to include in the pedigree: Starting with the data frame of ~10,000 records, I want to use a set of index animals to extract the immediate relatives and plot only a small number in the pedigree. 'Immediate relatives' is somewhat of an ambiguous term - I am currently defining it as 3 generations forward and 3 backward. Currently, I have a somewhat ugly approach where I recursively calculate each generation forward or backward and build a new dataframe. Is there a better approach or package that does this? I realize my code should be written better to get rid of the loops, so if anyone has suggestions there I would appreciate this as well. Thanks in advance. Code to calculate generations forward and backward: #queryIds holds the unique Ids for parents of the index animals queryIds = unique(c(ped$Sire, ped$Dam)); for(i in 1:gens){ if (length(queryIds) == 0){break}; #allPed is the dataframe with Id,Dam,Sire and Sex for animals in our colony newRows <- subset(allPed, Id %in% queryIds); queryIds = c(newRows$Sire, newRows$Dam); ped <- unique(rbind(newRows,ped)); } #build forwards #when calculating children, queryIds holds the Ids of the previous generation queryIds = unique(ped$Id); for(i in 1:gens){ if (length(queryIds)==0){break}; #allPed is the dataframe with Id,Dam,Sire and Sex for animals in our colony newRows <- subset(allPed, Sire %in% queryIds | Dam %in% queryIds); queryIds = newRows$Id; ped <- unique(rbind(newRows,ped)); } [[alternative HTML version deleted]]
Charles C. Berry
2010-Mar-18 17:32 UTC
[R] Pedigree / Identifying Immediate Family of Index Animal
On Thu, 18 Mar 2010, Ben Bimber wrote:> I have a data frame containing the Id, Mother, Father and Sex from about > 10,000 animals in our colony. I am interested in graphing simple family > trees for a given subject or small number of subjects. The basic idea is: > start with data frame from entire colony and list of index animals. I need > to identify all immediate relatives of these index animals and plot the > pedigree for them. We're not trying to do any sort of real analysis, just > present a visualization of the family structure. I have used the kinship > and pedigree packages to plot the pedigree. My question relates to > efficiently identifying the animals to include in the pedigree: > > Starting with the data frame of ~10,000 records, I want to use a set of > index animals to extract the immediate relatives and plot only a small > number in the pedigree. 'Immediate relatives' is somewhat of an ambiguous > term - I am currently defining it as 3 generations forward and 3 backward. > Currently, I have a somewhat ugly approach where I recursively calculate > each generation forward or backward and build a new dataframe. Is there a > better approach or package that does this? I realize my code should be > written better to get rid of the loops, so if anyone has suggestions there I > would appreciate this as well. Thanks in advance. >Using an indicator matrix for parent/child relations, you can identify future/past generations using matrix multiplication(s). Since you have 10000 animals, the matrix indicating parents/children will be 10000 x 10000, but will have <20000 non-zero elements. To me, this sounds like a good candidate for a sparse matrix representation. Packages 'Matrix' and 'SparseM' provide these. HTH, Chuck> Code to calculate generations forward and backward: > > #queryIds holds the unique Ids for parents of the index animals > queryIds = unique(c(ped$Sire, ped$Dam)); > for(i in 1:gens){ > if (length(queryIds) == 0){break}; > > #allPed is the dataframe with Id,Dam,Sire and Sex for animals in our > colony > newRows <- subset(allPed, Id %in% queryIds); > queryIds = c(newRows$Sire, newRows$Dam); > ped <- unique(rbind(newRows,ped)); > } > > > #build forwards > #when calculating children, queryIds holds the Ids of the previous > generation > queryIds = unique(ped$Id); > for(i in 1:gens){ > if (length(queryIds)==0){break}; > > #allPed is the dataframe with Id,Dam,Sire and Sex for animals in our > colony > newRows <- subset(allPed, Sire %in% queryIds | Dam %in% queryIds); > queryIds = newRows$Id; > ped <- unique(rbind(newRows,ped)); > } > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
Terry Therneau
2010-Mar-19 15:54 UTC
[R] Pedigree / Identifying Immediate Family of Index Animal
On Thu, 18 Mar 2010, Ben Bimber wrote:> I have a data frame containing the Id, Mother, Father and Sex fromabout> 10,000 animals in our colony. I am interested in graphing simplefamily> trees for a given subject or small number of subjects. The basic ideais:> start with data frame from entire colony and list of index animals. Ineed> to identify all immediate relatives of these index animals and plotthe> pedigree for them. We're not trying to do any sort of real analysis,just> present a visualization of the family structure. I have used thekinship> and pedigree packages to plot the pedigree. My question relates to > efficiently identifying the animals to include in the pedigree:Your basic idea is sound -- the drawing programs do the same type of thing and it is very fast. Your loop is only over 3 generations. My version of the function interleaves the up/down steps. Terry Therneau findKin <- function(id, dadid, momid, index, generations=3) { idrow <- match(index, id) if (any(is.na(idrow))) stop("Index subject not found") for (i in 1:generations) { # add parents idrow <- c(idrow, match(momid[idrow], id, nomatch=0), match(dadid[idrow], id, nomatch=0)) idrow <- unique(idrow[idrow>0]) # toss the zeros # add children idrow <- c(idrow, which(match(momid, id[idrow], nomatch=0) >0), which(match(dadid, id[idrow], nomatch=0) >0)) idrow <- unique(idrow[idrow>0]) } idrow }