Hi all, I am reading a huge data set(12M rows) that contains family information, Offspring, Parent1 and Parent2 Parent1 and parent2 should be in the first column as an offspring before their offspring information. Their parent information (parent1 and parent2) should be set to zero, if unknown. Also the first column should be unique. Here is my sample data set and desired output. fam <- read.table(textConnection(" offspring Parent1 Parent2 Smith Alex1 Alexa Carla Alex1 0 Jacky Smith Abbot Jack 0 Jacky Almo Jack Carla "),header = TRUE) desired output. Offspring Parent1 Parent2 Alex1 0 0 Alexa 0 0 Abbot 0 0 Smith Alex1 Alexa Carla Alex1 0 Jacky Smith Abbot Jack 0 Jacky Almo Jack Carla Thank you.
This question is about algorithm help... or rather, "do my work for me", not about R. Study up on "directed acyclic graphs" [1]... there actually are some packages related to such data structures on CRAN (e.g. pooh::tsort, Task View gR "gRaphical Models in R"), but you should at least be aware of the possible approaches before we talk about implementing (that is the "R" part that is on topic here) one of them on this list. [1] https://en.wikipedia.org/wiki/Topological_sorting -- Sent from my phone. Please excuse my brevity. On November 17, 2017 4:28:09 PM PST, Val <valkremk at gmail.com> wrote:>Hi all, >I am reading a huge data set(12M rows) that contains family >information, >Offspring, Parent1 and Parent2 > >Parent1 and parent2 should be in the first column as an offspring >before their offspring information. Their parent information (parent1 >and parent2) should be set to zero, if unknown. Also the first >column should be unique. > > >Here is my sample data set and desired output. > > >fam <- read.table(textConnection(" offspring Parent1 Parent2 >Smith Alex1 Alexa >Carla Alex1 0 >Jacky Smith Abbot >Jack 0 Jacky >Almo Jack Carla > "),header = TRUE) > > > >desired output. >Offspring Parent1 Parent2 >Alex1 0 0 >Alexa 0 0 >Abbot 0 0 >Smith Alex1 Alexa >Carla Alex1 0 >Jacky Smith Abbot >Jack 0 Jacky >Almo Jack Carla > >Thank you. > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
> On Nov 17, 2017, at 4:28 PM, Val <valkremk at gmail.com> wrote: > > Hi all, > I am reading a huge data set(12M rows) that contains family information, > Offspring, Parent1 and Parent2 > > Parent1 and parent2 should be in the first column as an offspring > before their offspring information. Their parent information (parent1 > and parent2) should be set to zero, if unknown. Also the first > column should be unique. > > > Here is my sample data set and desired output. > > > fam <- read.table(textConnection(" offspring Parent1 Parent2 > Smith Alex1 Alexa > Carla Alex1 0 > Jacky Smith Abbot > Jack 0 Jacky > Almo Jack Carla > "),header = TRUE) > > > > desired output. > Offspring Parent1 Parent2 > Alex1 0 0 > Alexa 0 0 > Abbot 0 0 > Smith Alex1 Alexa > Carla Alex1 0 > Jacky Smith Abbot > Jack 0 Jacky > Almo Jack CarlaYou might get useful ideas by looking at ?'%in%" and ?union (set operations)> fam$Parent1[!fam$Parent1 %in% fam$offspring][1] "Alex1" "Alex1" "0"> fam$Parent2[!fam$Parent1 %in% fam$offspring][1] "Alexa" "0" "Jacky" David.> > Thank you. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law