Juliet Hannah
2008-Oct-25 15:08 UTC
[R] create group variable -- family data -- for siblings
For the following data: famdat <- read.table(textConnection("ind momid dadid 1 18 19 2 18 19 3 18 19 4 21 22 5 21 22 6 23 25 7 23 27 8 29 30 9 31 30 10 40 41 11 NA NA 12 50 51"),header=TRUE) closeAllConnections(); I would like to create a label (1,2,3..) for siblings. Siblings will be defined by those who have both the same momid and dadid, but also those who just have the same momid or the same dadid. In addition, there will be those without siblings and those whose parents are missing, and they will get unique ids. For the data above, the result would be: ind momid dadid sibid 1 1 18 19 1 2 2 18 19 1 3 3 18 19 1 4 4 21 22 2 5 5 21 22 2 6 6 23 25 3 7 7 23 27 3 8 8 29 30 4 9 9 31 30 4 10 10 40 41 5 11 11 NA NA 6 12 12 50 51 7 Thanks! Juliet
Gabor Grothendieck
2008-Oct-25 16:52 UTC
[R] create group variable -- family data -- for siblings
Create a distance metric which is 0 if there are common mothers or fathers and 1 otherwise using that to cluster your points: dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid, "!=")) dd[is.na(dd)] <- 1 hc <- hclust(as.dist(dd)) cutree(hc, h = 0.1) On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote:> For the following data: > > famdat <- read.table(textConnection("ind momid dadid > 1 18 19 > 2 18 19 > 3 18 19 > 4 21 22 > 5 21 22 > 6 23 25 > 7 23 27 > 8 29 30 > 9 31 30 > 10 40 41 > 11 NA NA > 12 50 51"),header=TRUE) > closeAllConnections(); > > I would like to create a label (1,2,3..) for siblings. Siblings will > be defined by those who have both the same momid and dadid, but also > those who > just have the same momid or the same dadid. In addition, there will be > those without siblings and those whose parents are missing, and they > will > get unique ids. For the data above, the result would be: > > ind momid dadid sibid > 1 1 18 19 1 > 2 2 18 19 1 > 3 3 18 19 1 > 4 4 21 22 2 > 5 5 21 22 2 > 6 6 23 25 3 > 7 7 23 27 3 > 8 8 29 30 4 > 9 9 31 30 4 > 10 10 40 41 5 > 11 11 NA NA 6 > 12 12 50 51 7 > > Thanks! > > Juliet > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Gabor Grothendieck
2008-Oct-25 17:28 UTC
[R] create group variable -- family data -- for siblings
Here is one other solution. For each row it finds the earliest row that has the same momid or popid: f <- function(i) { if (is.na(famdat[i, 1]) || is.na(famdat[i, 2])) { i } else { i1 <- match(famdat[i, 1], famdat[1:i, 1]) i2 <- match(famdat[i, 2], famdat[1:i, 2]) min(i1, i2) } } as.numeric(factor(sapply(1:nrow(famdat), f))) On Sat, Oct 25, 2008 at 12:52 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> Create a distance metric which is 0 if there are common mothers or > fathers and 1 otherwise using that to cluster your points: > > dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid, "!=")) > dd[is.na(dd)] <- 1 > hc <- hclust(as.dist(dd)) > cutree(hc, h = 0.1) > > On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote: >> For the following data: >> >> famdat <- read.table(textConnection("ind momid dadid >> 1 18 19 >> 2 18 19 >> 3 18 19 >> 4 21 22 >> 5 21 22 >> 6 23 25 >> 7 23 27 >> 8 29 30 >> 9 31 30 >> 10 40 41 >> 11 NA NA >> 12 50 51"),header=TRUE) >> closeAllConnections(); >> >> I would like to create a label (1,2,3..) for siblings. Siblings will >> be defined by those who have both the same momid and dadid, but also >> those who >> just have the same momid or the same dadid. In addition, there will be >> those without siblings and those whose parents are missing, and they >> will >> get unique ids. For the data above, the result would be: >> >> ind momid dadid sibid >> 1 1 18 19 1 >> 2 2 18 19 1 >> 3 3 18 19 1 >> 4 4 21 22 2 >> 5 5 21 22 2 >> 6 6 23 25 3 >> 7 7 23 27 3 >> 8 8 29 30 4 >> 9 9 31 30 4 >> 10 10 40 41 5 >> 11 11 NA NA 6 >> 12 12 50 51 7 >> >> Thanks! >> >> Juliet >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
Gabor Grothendieck
2008-Oct-25 17:56 UTC
[R] create group variable -- family data -- for siblings
Correction and shortening: f <- function(i) { i1 <- if (is.na(famdat[i, 2])) i else match(famdat[i, 2], famdat[1:i, 2]) i2 <- if (is.na(famdat[i, 3])) i else match(famdat[i, 3], famdat[1:i, 3]) min(i1, i2) } as.numeric(factor(sapply(1:nrow(famdat), f))) On Sat, Oct 25, 2008 at 1:28 PM, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:> Here is one other solution. For each row it finds the > earliest row that has the same momid or popid: > > > f <- function(i) { > if (is.na(famdat[i, 1]) || is.na(famdat[i, 2])) { > i > } else { > i1 <- match(famdat[i, 1], famdat[1:i, 1]) > i2 <- match(famdat[i, 2], famdat[1:i, 2]) > min(i1, i2) > } > } > as.numeric(factor(sapply(1:nrow(famdat), f))) > > > On Sat, Oct 25, 2008 at 12:52 PM, Gabor Grothendieck > <ggrothendieck at gmail.com> wrote: >> Create a distance metric which is 0 if there are common mothers or >> fathers and 1 otherwise using that to cluster your points: >> >> dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid, "!=")) >> dd[is.na(dd)] <- 1 >> hc <- hclust(as.dist(dd)) >> cutree(hc, h = 0.1) >> >> On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at gmail.com> wrote: >>> For the following data: >>> >>> famdat <- read.table(textConnection("ind momid dadid >>> 1 18 19 >>> 2 18 19 >>> 3 18 19 >>> 4 21 22 >>> 5 21 22 >>> 6 23 25 >>> 7 23 27 >>> 8 29 30 >>> 9 31 30 >>> 10 40 41 >>> 11 NA NA >>> 12 50 51"),header=TRUE) >>> closeAllConnections(); >>> >>> I would like to create a label (1,2,3..) for siblings. Siblings will >>> be defined by those who have both the same momid and dadid, but also >>> those who >>> just have the same momid or the same dadid. In addition, there will be >>> those without siblings and those whose parents are missing, and they >>> will >>> get unique ids. For the data above, the result would be: >>> >>> ind momid dadid sibid >>> 1 1 18 19 1 >>> 2 2 18 19 1 >>> 3 3 18 19 1 >>> 4 4 21 22 2 >>> 5 5 21 22 2 >>> 6 6 23 25 3 >>> 7 7 23 27 3 >>> 8 8 29 30 4 >>> 9 9 31 30 4 >>> 10 10 40 41 5 >>> 11 11 NA NA 6 >>> 12 12 50 51 7 >>> >>> Thanks! >>> >>> Juliet >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >