thr3ads.net - R help - [R] create group variable -- family data -- for siblings [Oct 2008]

If this information is useful, please help other people find it:
Share via:

Juliet Hannah

2008-Oct-25 15:08 UTC

[R] create group variable -- family data -- for siblings

For the following data:

famdat <- read.table(textConnection("ind momid dadid
1   18    19
2   18    19
3   18    19
4   21    22
5   21    22
6   23    25
7   23    27
8   29    30
9   31    30
10  40    41
11  NA    NA
12  50    51"),header=TRUE)
closeAllConnections();

I would like to create a label (1,2,3..) for siblings. Siblings will
be defined by those who have both the same momid and dadid, but also
those who
just have the same momid or the same dadid. In addition, there will be
those without siblings and those whose parents are missing, and they
will
get unique ids. For the data above, the result would be:

   ind momid dadid sibid
1    1    18    19      1
2    2    18    19      1
3    3    18    19      1
4    4    21    22      2
5    5    21    22      2
6    6    23    25      3
7    7    23    27      3
8    8    29    30      4
9    9    31    30      4
10  10    40    41     5
11  11    NA    NA   6
12  12    50    51     7

Thanks!

Juliet

Gabor Grothendieck

2008-Oct-25 16:52 UTC

head link

[R] create group variable -- family data -- for siblings

Create a distance metric which is 0 if there are common mothers or
fathers and 1 otherwise using that to cluster your points:

dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid, dadid,
"!="))
dd[is.na(dd)] <- 1
hc <- hclust(as.dist(dd))
cutree(hc, h = 0.1)

On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at
gmail.com> wrote:> For the following data:
>
> famdat <- read.table(textConnection("ind momid dadid
> 1   18    19
> 2   18    19
> 3   18    19
> 4   21    22
> 5   21    22
> 6   23    25
> 7   23    27
> 8   29    30
> 9   31    30
> 10  40    41
> 11  NA    NA
> 12  50    51"),header=TRUE)
> closeAllConnections();
>
> I would like to create a label (1,2,3..) for siblings. Siblings will
> be defined by those who have both the same momid and dadid, but also
> those who
> just have the same momid or the same dadid. In addition, there will be
> those without siblings and those whose parents are missing, and they
> will
> get unique ids. For the data above, the result would be:
>
>   ind momid dadid sibid
> 1    1    18    19      1
> 2    2    18    19      1
> 3    3    18    19      1
> 4    4    21    22      2
> 5    5    21    22      2
> 6    6    23    25      3
> 7    7    23    27      3
> 8    8    29    30      4
> 9    9    31    30      4
> 10  10    40    41     5
> 11  11    NA    NA   6
> 12  12    50    51     7
>
> Thanks!
>
> Juliet
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Gabor Grothendieck

2008-Oct-25 17:28 UTC

head link

[R] create group variable -- family data -- for siblings

Here is one other solution. For each row it finds the
earliest row that has the same momid or popid:


f <- function(i) {
  if (is.na(famdat[i, 1]) || is.na(famdat[i, 2])) {
	  i
  } else {
	  i1 <- match(famdat[i, 1], famdat[1:i, 1])
	  i2 <- match(famdat[i, 2], famdat[1:i, 2])
	  min(i1, i2)
  }
}
as.numeric(factor(sapply(1:nrow(famdat), f)))


On Sat, Oct 25, 2008 at 12:52 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:> Create a distance metric which is 0 if there are common mothers or
> fathers and 1 otherwise using that to cluster your points:
>
> dd <- with(famdat, outer(momid, momid, "!=") * outer(dadid,
dadid, "!="))
> dd[is.na(dd)] <- 1
> hc <- hclust(as.dist(dd))
> cutree(hc, h = 0.1)
>
> On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at
gmail.com> wrote:
>> For the following data:
>>
>> famdat <- read.table(textConnection("ind momid dadid
>> 1   18    19
>> 2   18    19
>> 3   18    19
>> 4   21    22
>> 5   21    22
>> 6   23    25
>> 7   23    27
>> 8   29    30
>> 9   31    30
>> 10  40    41
>> 11  NA    NA
>> 12  50    51"),header=TRUE)
>> closeAllConnections();
>>
>> I would like to create a label (1,2,3..) for siblings. Siblings will
>> be defined by those who have both the same momid and dadid, but also
>> those who
>> just have the same momid or the same dadid. In addition, there will be
>> those without siblings and those whose parents are missing, and they
>> will
>> get unique ids. For the data above, the result would be:
>>
>>   ind momid dadid sibid
>> 1    1    18    19      1
>> 2    2    18    19      1
>> 3    3    18    19      1
>> 4    4    21    22      2
>> 5    5    21    22      2
>> 6    6    23    25      3
>> 7    7    23    27      3
>> 8    8    29    30      4
>> 9    9    31    30      4
>> 10  10    40    41     5
>> 11  11    NA    NA   6
>> 12  12    50    51     7
>>
>> Thanks!
>>
>> Juliet
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

Gabor Grothendieck

2008-Oct-25 17:56 UTC

head link

[R] create group variable -- family data -- for siblings

Correction and shortening:

f <- function(i) {
  i1 <- if (is.na(famdat[i, 2])) i else match(famdat[i, 2], famdat[1:i, 2])
  i2 <- if (is.na(famdat[i, 3])) i else match(famdat[i, 3], famdat[1:i, 3])
  min(i1, i2)
}
as.numeric(factor(sapply(1:nrow(famdat), f)))


On Sat, Oct 25, 2008 at 1:28 PM, Gabor Grothendieck
<ggrothendieck at gmail.com> wrote:> Here is one other solution. For each row it finds the
> earliest row that has the same momid or popid:
>
>
> f <- function(i) {
>  if (is.na(famdat[i, 1]) || is.na(famdat[i, 2])) {
>          i
>  } else {
>          i1 <- match(famdat[i, 1], famdat[1:i, 1])
>          i2 <- match(famdat[i, 2], famdat[1:i, 2])
>          min(i1, i2)
>  }
> }
> as.numeric(factor(sapply(1:nrow(famdat), f)))
>
>
> On Sat, Oct 25, 2008 at 12:52 PM, Gabor Grothendieck
> <ggrothendieck at gmail.com> wrote:
>> Create a distance metric which is 0 if there are common mothers or
>> fathers and 1 otherwise using that to cluster your points:
>>
>> dd <- with(famdat, outer(momid, momid, "!=") *
outer(dadid, dadid, "!="))
>> dd[is.na(dd)] <- 1
>> hc <- hclust(as.dist(dd))
>> cutree(hc, h = 0.1)
>>
>> On Sat, Oct 25, 2008 at 11:08 AM, Juliet Hannah <juliet.hannah at
gmail.com> wrote:
>>> For the following data:
>>>
>>> famdat <- read.table(textConnection("ind momid dadid
>>> 1   18    19
>>> 2   18    19
>>> 3   18    19
>>> 4   21    22
>>> 5   21    22
>>> 6   23    25
>>> 7   23    27
>>> 8   29    30
>>> 9   31    30
>>> 10  40    41
>>> 11  NA    NA
>>> 12  50    51"),header=TRUE)
>>> closeAllConnections();
>>>
>>> I would like to create a label (1,2,3..) for siblings. Siblings
will
>>> be defined by those who have both the same momid and dadid, but
also
>>> those who
>>> just have the same momid or the same dadid. In addition, there will
be
>>> those without siblings and those whose parents are missing, and
they
>>> will
>>> get unique ids. For the data above, the result would be:
>>>
>>>   ind momid dadid sibid
>>> 1    1    18    19      1
>>> 2    2    18    19      1
>>> 3    3    18    19      1
>>> 4    4    21    22      2
>>> 5    5    21    22      2
>>> 6    6    23    25      3
>>> 7    7    23    27      3
>>> 8    8    29    30      4
>>> 9    9    31    30      4
>>> 10  10    40    41     5
>>> 11  11    NA    NA   6
>>> 12  12    50    51     7
>>>
>>> Thanks!
>>>
>>> Juliet
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>

R help - Oct 2008 - create group variable -- family data -- for siblings

[R] create group variable -- family data -- for siblings

[R] create group variable -- family data -- for siblings

[R] create group variable -- family data -- for siblings

[R] create group variable -- family data -- for siblings