Hi all, I have a dataset of individuals where the variable ID corresponds to the identification of the household where the individual lives. rel.head stands for the relationship with the household head. so rel.head=1 is the household head, rel.head=2 is the spouse, rel.head=3 is the children. Here is an example to see how it looks like: df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103", "17103", "17104", "17104", "17104", "17105", "17105"), rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3")) I want to add a dummy variable that is equal to 1 when these conditions held simultaneously : a) the number of rows with same ID is equal to 2 b) the variable rel.head=1 and rel.head=3 So my ideal output is: ID rel.head added.dummy 1 17100 1 1 2 17100 3 1 3 17101 1 0 4 17102 1 0 5 17103 1 0 6 17103 2 0 7 17104 1 0 8 17104 2 0 9 17104 3 0 10 17105 1 1 11 17105 3 1 Is there a simple way to do that? Can somebody help? Thanks in advance, Grazia
Hi, I am sure there are better / more efficient ways of doing this, but the following seems to work ... ids <- sapply(split(df,df$ID),function(x) {length(x$rel.head)==2 & any(x$rel.head==1) & any(x$rel.head==3)}) ids <- as.numeric(names(ids)[ids]) added.dummy <- as.numeric(df$ID%in%ids) cbind(df,added.dummy) Martyn -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of grazia at stat.columbia.edu Sent: 04 October 2011 16:45 To: r-help at r-project.org Subject: [R] adding a dummy variable... Hi all, I have a dataset of individuals where the variable ID corresponds to the identification of the household where the individual lives. rel.head stands for the relationship with the household head. so rel.head=1 is the household head, rel.head=2 is the spouse, rel.head=3 is the children. Here is an example to see how it looks like: df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103", "17103", "17104", "17104", "17104", "17105", "17105"), rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3")) I want to add a dummy variable that is equal to 1 when these conditions held simultaneously : a) the number of rows with same ID is equal to 2 b) the variable rel.head=1 and rel.head=3 So my ideal output is: ID rel.head added.dummy 1 17100 1 1 2 17100 3 1 3 17101 1 0 4 17102 1 0 5 17103 1 0 6 17103 2 0 7 17104 1 0 8 17104 2 0 9 17104 3 0 10 17105 1 1 11 17105 3 1 Is there a simple way to do that? Can somebody help? Thanks in advance, Grazia ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ________________________________________________________________________ This e-mail has been scanned for all viruses by Star.\ _...{{dropped:12}}
Hi: Here's another way to do it with the plyr package, also not terribly elegant. It assumes that rel.head is a factor in your original data frame:> str(df)'data.frame': 11 obs. of 2 variables: $ ID : Factor w/ 6 levels "17100","17101",..: 1 1 2 3 4 4 5 5 5 6 ... $ rel.head: Factor w/ 3 levels "1","2","3": 1 3 1 1 1 2 1 2 3 1 ... If this is not the case in your data, then you need to modify the function f below accordingly. (This is why use of dput() is preferred when sending example data to R-help, BTW.) library('plyr') f <- function(d) { tvec <- factor(c(1, 3), levels = 1:3) # target vector if(nrow(d) != 2L) {d$dummy <- rep(0, nrow(d)); return(d)} # If the first if statement is FALSE, then the following code is run: d$dummy <- ifelse(!identical(d[, 2], tvec), 0, 1) d } ddply(df, .(ID), f) ID rel.head dummy 1 17100 1 1 2 17100 3 1 3 17101 1 0 4 17102 1 0 5 17103 1 0 6 17103 2 0 7 17104 1 0 8 17104 2 0 9 17104 3 0 10 17105 1 1 11 17105 3 1 HTH, Dennis On Tue, Oct 4, 2011 at 8:44 AM, <grazia at stat.columbia.edu> wrote:> Hi all, > > I have a dataset of individuals where the variable ID corresponds to the > identification of the household where the individual lives. rel.head stands > for the relationship with the household head. so rel.head=1 is the household > head, rel.head=2 is the spouse, rel.head=3 is the children. > > Here is an example to see how it looks like: > > df<-data.frame(ID=c("17100", "17100", "17101", "17102", "17103", "17103", > ? ? ? ? ? ? ? ? ? ? "17104", "17104", "17104", "17105", "17105"), > ?rel.head=c("1","3","1","1","1", "2", "1", "2", "3", "1", "3")) > > > I want to add a dummy variable that is equal to 1 when these conditions > held simultaneously : > > a) the number of rows with same ID is equal to 2 > b) the variable rel.head=1 and rel.head=3 > > > So my ideal output is: > > ? ID ? ? ?rel.head ? added.dummy > 1 ?17100 ? ? ? ?1 ? ? ? ? ? 1 > 2 ?17100 ? ? ? ?3 ? ? ? ? ? 1 > 3 ?17101 ? ? ? ?1 ? ? ? ? ? 0 > 4 ?17102 ? ? ? ?1 ? ? ? ? ? 0 > 5 ?17103 ? ? ? ?1 ? ? ? ? ? 0 > 6 ?17103 ? ? ? ?2 ? ? ? ? ? 0 > 7 ?17104 ? ? ? ?1 ? ? ? ? ? 0 > 8 ?17104 ? ? ? ?2 ? ? ? ? ? 0 > 9 ?17104 ? ? ? ?3 ? ? ? ? ? 0 > 10 17105 ? ? ? ?1 ? ? ? ? ? 1 > 11 17105 ? ? ? ?3 ? ? ? ? ? 1 > > Is there a simple way to do that? > Can somebody help? > > Thanks in advance, > Grazia > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >