R 3.1.1 OS X Colleagues, I have a dataset containing multiple columns indicating race for subjects in a clinical trial. A subset of the data (obtained with dput) is shown here: structure(list(PLTID = c(7157, 8138, 8150, 9112, 9114, 9115, 9124, 9133, 9141, 9144, 9148, 12110, 12111, 12116, 12134, 12136, 12137, 12142, 12143, 12146, 12147, 13159), Indian..RACE1. = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Asian..RACE2. = c("", "Yes", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", ""), Black..RACE3. = c("Yes", "", "", "Yes", "Yes", "Yes", "Yes", "Yes", "", "Yes", "", "", "", "", "", "", "", "Yes", "Yes", "", "", ""), Native.Hawaiian.or.other.Pacif..RACE4. = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), White..RACE5. = c("", "", "Yes", "", "", "", "", "", "Yes", "", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "", "", "Yes", "Yes", "Yes"), Other.Race..RACE6. = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), Specify.Other.Race..RACEOTH. = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("PLTID", "Indian..RACE1.", "Asian..RACE2.", "Black..RACE3.", "Native.Hawaiian.or.other.Pacif..RACE4.", "White..RACE5.", "Other.Race..RACE6.", "Specify.Other.Race..RACEOTH."), class = "data.frame", row.names = 43:64) I would like to add a column that indicates which of the other columns contains ?Yes?. In other words, that column would contain: Black..RACE3. Asian..RACE2. White..RACE5. Black..RACE3. ? Even better would be Black Asian White Black ? (which I can accomplish with strsplit) None of the rows contains more than one ?Yes? although it is possible that none of the entries in a row would be ?Yes? (in which case, the entry in the new column should be NA) I could do this by looping through each of the columns with something like this: DATA$RACE <- NA for (COL in 2:8) DATA$RACE[which(DATA[,COL] == "Yes")] <- names(DATA)[COL] But, I suspect that there is some more elegant way to accomplish this. Thanks in advance. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com
Dear Dennis, Assuming that your data.frame() is called dd, the following should get you started: colnames(dd[,-1])[apply(dd[,-1], 1, function(x) which(x == 'Yes'))] HTH, Jorge.- On Sat, Nov 1, 2014 at 12:32 PM, Fisher Dennis <fisher at plessthan.com> wrote:> R 3.1.1 > OS X > > Colleagues, > I have a dataset containing multiple columns indicating race for subjects > in a clinical trial. A subset of the data (obtained with dput) is shown > here: > > structure(list(PLTID = c(7157, 8138, 8150, 9112, 9114, 9115, > 9124, 9133, 9141, 9144, 9148, 12110, 12111, 12116, 12134, 12136, > 12137, 12142, 12143, 12146, 12147, 13159), Indian..RACE1. = c(NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA, NA, NA), Asian..RACE2. = c("", "Yes", "", "", "", > "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", > ""), Black..RACE3. = c("Yes", "", "", "Yes", "Yes", "Yes", "Yes", > "Yes", "", "Yes", "", "", "", "", "", "", "", "Yes", "Yes", "", > "", ""), Native.Hawaiian.or.other.Pacif..RACE4. = c(NA, NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA), White..RACE5. = c("", "", "Yes", "", "", "", "", > "", "Yes", "", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", > "", "", "Yes", "Yes", "Yes"), Other.Race..RACE6. = c(NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA, NA), Specify.Other.Race..RACEOTH. = c(NA, NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA)), .Names = c("PLTID", "Indian..RACE1.", "Asian..RACE2.", > "Black..RACE3.", "Native.Hawaiian.or.other.Pacif..RACE4.", "White..RACE5.", > "Other.Race..RACE6.", "Specify.Other.Race..RACEOTH."), class > "data.frame", row.names = 43:64) > > I would like to add a column that indicates which of the other columns > contains "Yes". In other words, that column would contain: > Black..RACE3. > Asian..RACE2. > White..RACE5. > Black..RACE3. > ... > > Even better would be > Black > Asian > White > Black > ... > (which I can accomplish with strsplit) > > None of the rows contains more than one 'Yes' although it is possible that > none of the entries in a row would be 'Yes' (in which case, the entry in > the new column should be NA) > > I could do this by looping through each of the columns with something like > this: > DATA$RACE <- NA > for (COL in 2:8) DATA$RACE[which(DATA[,COL] == "Yes")] <- > names(DATA)[COL] > But, I suspect that there is some more elegant way to accomplish this. > > Thanks in advance. > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
This method handles cases where multiple columns are "Yes". library(reshape2) ddl <- melt( dd, id.vars = "PLTID" ) ddl[ is.na( ddl$value ), "value" ] <- "" ddl <- ddl[ "Yes" == ddl$value, ] result <- merge( dd[ , "PLTID", drop=FALSE ] , ddl[ , c( "PLTID", "variable", "value" ) ] , all.x=TRUE ) On Fri, 31 Oct 2014, Fisher Dennis wrote:> R 3.1.1 > OS X > > Colleagues, > I have a dataset containing multiple columns indicating race for subjects in a clinical trial. A subset of the data (obtained with dput) is shown here: > > structure(list(PLTID = c(7157, 8138, 8150, 9112, 9114, 9115, > 9124, 9133, 9141, 9144, 9148, 12110, 12111, 12116, 12134, 12136, > 12137, 12142, 12143, 12146, 12147, 13159), Indian..RACE1. = c(NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA, NA, NA), Asian..RACE2. = c("", "Yes", "", "", "", > "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", "", > ""), Black..RACE3. = c("Yes", "", "", "Yes", "Yes", "Yes", "Yes", > "Yes", "", "Yes", "", "", "", "", "", "", "", "Yes", "Yes", "", > "", ""), Native.Hawaiian.or.other.Pacif..RACE4. = c(NA, NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA), White..RACE5. = c("", "", "Yes", "", "", "", "", > "", "Yes", "", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", > "", "", "Yes", "Yes", "Yes"), Other.Race..RACE6. = c(NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA, NA), Specify.Other.Race..RACEOTH. = c(NA, NA, NA, > NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, > NA, NA, NA)), .Names = c("PLTID", "Indian..RACE1.", "Asian..RACE2.", > "Black..RACE3.", "Native.Hawaiian.or.other.Pacif..RACE4.", "White..RACE5.", > "Other.Race..RACE6.", "Specify.Other.Race..RACEOTH."), class = "data.frame", row.names = 43:64) > > I would like to add a column that indicates which of the other columns contains ?Yes?. In other words, that column would contain: > Black..RACE3. > Asian..RACE2. > White..RACE5. > Black..RACE3. > ? > > Even better would be > Black > Asian > White > Black > ? > (which I can accomplish with strsplit) > > None of the rows contains more than one ?Yes? although it is possible that none of the entries in a row would be ?Yes? (in which case, the entry in the new column should be NA) > > I could do this by looping through each of the columns with something like this: > DATA$RACE <- NA > for (COL in 2:8) DATA$RACE[which(DATA[,COL] == "Yes")] <- names(DATA)[COL] > But, I suspect that there is some more elegant way to accomplish this. > > Thanks in advance. > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k