Hello, I need some help in creating a new variable. I need to create a 'couple identifier', which gives a unique code for every couple/triple/... in a household. So, I can identify couples. To do this, I should use 4 variables: * SERIAL = a unique numeric code for each household * PERNUM = a unique numeric code for each person * SPLOC = the numeric code of the spouse in the household, it is equal to the PERNUM code of the spouse * SPRULE = rules for linking a spouse, numeric code from 00 to 06 To create the couple identifier, I need these conditions: * SERIAL needs to be equal for these persons in the couples * SPLOC > 0 * SPLOC = PERNUM * SPRULE = 01 or 02 What I already did is this: attach(ipumsi_00008_dta) library(tinytex) library(dplyr) library(ggplot2) library(tidyr) library(knitr) library(forcats) library(mice) library(pander) library(ggcorrplot) library(lubridate) # true/false code when sploc is greater than zero ipumsi_00008_dta <- mutate(ipumsi_00008_dta, sploc_greater_than_zero = sploc>0) # true/false code when sploc is greater then zero and sprule is equal to 1 or 2 ipumsi_00008_dta <- mutate(ipumsi_00008_dta, rule_union = sploc>0 & sprule==1 | sprule==2) => Now I want to create a numeric code for true values of rule_union when serials are equal, so when they are persons of the same household. What method should I use to do this? Thank you very much!! [[alternative HTML version deleted]]
Hi Hannah, Without knowing how the data are organized and what each numeric code means, it is a bit difficult. If it is assumed that each row in the data frame(?) ipumsi_00008_dta is a case (individual) and an individual may have zero or more spouses, there would have to be more than one field for "sploc" for those who had more than one "spouse". I would approach it by creating a variable named "relcode" that was unique for each "union", so that if more than one individual had the same non-zero "relcode" they would all be in the same "relationship". That still leaves us with exclusive relationships, so there would have to be multiple fields for "relcode" for groups of people who were in different relationships in the same household. I know that this is being pedantic, but it looks like a set intersection problem of the Bob and Carol and Ted and Alice variety. Jim On Wed, Oct 28, 2020 at 6:39 AM Hannah Van Impe <hannahvanimpe at outlook.com> wrote:> Hello, > > I need some help in creating a new variable. I need to create a 'couple > identifier', which gives a unique code for every couple/triple/... in a > household. So, I can identify couples. To do this, I should use 4 variables: > > * SERIAL = a unique numeric code for each household > * PERNUM = a unique numeric code for each person > * SPLOC = the numeric code of the spouse in the household, it is equal > to the PERNUM code of the spouse > * SPRULE = rules for linking a spouse, numeric code from 00 to 06 > > > To create the couple identifier, I need these conditions: > > * SERIAL needs to be equal for these persons in the couples > * SPLOC > 0 > * SPLOC = PERNUM > * SPRULE = 01 or 02 > > What I already did is this: > > attach(ipumsi_00008_dta) > library(tinytex) > library(dplyr) > library(ggplot2) > library(tidyr) > library(knitr) > library(forcats) > library(mice) > library(pander) > library(ggcorrplot) > library(lubridate) > # true/false code when sploc is greater than zero > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, sploc_greater_than_zero > sploc>0) > # true/false code when sploc is greater then zero and sprule is equal to 1 > or 2 > ipumsi_00008_dta <- mutate(ipumsi_00008_dta, rule_union = sploc>0 & > sprule==1 | sprule==2) > > => Now I want to create a numeric code for true values of rule_union when > serials are equal, so when they are persons of the same household. > What method should I use to do this? > > Thank you very much!! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi Hannah, Yes, that does give me more insight. The polygyny only doesn't matter, for even if there was polyandry it could be coded in the same way. If I understand this correctly you have one variable (SPRULE) that must contain information about the household of the individual and the identities of his (or her) relationship partner(s). The household number can accommodate up to three relationship partners as long as there are a maximum of 9 people per household. Given these constraints, I think this may do what you want: ipumsi_00008_dta<- read.table( text="country year sample serial hhwt pernum perwt resident sploc sprule 204 2013 204201301 4000 10 1 10 1 5 2 204 2013 204201301 4000 10 2 10 1 0 2 204 2013 204201301 4000 10 3 10 1 0 2 204 2013 204201301 4000 10 4 10 1 0 2 204 2013 204201301 4000 10 5 10 1 1 2 204 2013 204201301 4000 10 6 10 1 1 2 204 2013 204201301 4000 10 7 10 1 0 2 204 2013 204201301 4000 10 8 10 1 0 2 204 2013 204201301 4000 10 9 10 1 0 2 204 2013 204201301 7000 10 1 10 1 2 1 204 2013 204201301 7000 10 2 10 1 1 1 204 2013 204201301 7000 10 3 10 1 0 0 204 2013 204201301 7000 10 4 10 1 5 1 204 2013 204201301 7000 10 5 10 1 4 1", header=TRUE,stringsAsFactors=FALSE) for(hh in unique(ipumsi_00008_dta$serial)) { cat("hh",hh," ") for(ind in ipumsi_00008_dta$pernum[ipumsi_00008_dta$serial == hh]) { cat("ind",ind,"\n") if(ipumsi_00008_dta$sploc[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$pernum == ind] > 0) { cat("sploc > 0\n") relationships<- ipumsi_00008_dta$pernum[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$sploc == ind] cat(relationships,"\n") if(length(relationships > 1)) { ipumsi_00008_dta$sprule[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$pernum == ind]<- hh+as.numeric(paste0(relationships,collapse="")) } else { ipumsi_00008_dta$sprule[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$pernum == ind]<-hh+ ipumsi_00008_dta$sploc[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$pernum == ind] } } else { ipumsi_00008_dta$sprule[ipumsi_00008_dta$serial == hh & ipumsi_00008_dta$pernum == ind]<-hh } } } ipumsi_00008_dta Note that this is a bad way to create a field in a database and it would be better to create the information on the fly with a query. It won't work with between household relationships Jim On Wed, Oct 28, 2020 at 10:08 PM Hannah Van Impe <hannahvanimpe at outlook.com> wrote:> Hello > Again, thank you very much for the help!! >...> In this foto, you can see that there is a polygamous union in household 6. > The men always only shows 1 sploc variable, even when they have multiple > women. So, only men have multiple women and women don?t have multiple men. > So here, The man is observation 49, and he is linked with woman 53, but he > can also be linked with other women. We see that woman 53 is linked with > this man, but also woman 54 is linked with this man. So the man in > observation 49 is linked with two women, 53 and 54, and I would like to > give these 3 observations the same numerical code, so I can identity this > union. > Is it possible to give more explanation when you have this information? > Hannah > >[[alternative HTML version deleted]]