thr3ads.net - R help - [R] Creating a new column from a series of columns [Nov 2014]

If this information is useful, please help other people find it:
Share via:

Fisher Dennis

2014-Nov-01 01:32 UTC

[R] Creating a new column from a series of columns

R 3.1.1
OS X

Colleagues,
I have a dataset containing multiple columns indicating race for subjects in a
clinical trial.  A subset of the data (obtained with dput) is shown here:

structure(list(PLTID = c(7157, 8138, 8150, 9112, 9114, 9115, 
9124, 9133, 9141, 9144, 9148, 12110, 12111, 12116, 12134, 12136, 
12137, 12142, 12143, 12146, 12147, 13159), Indian..RACE1. = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA), Asian..RACE2. = c("", "Yes",
"", "", "",
"", "", "", "", "",
"", "", "", "", "",
"", "", "", "", "",
"",
""), Black..RACE3. = c("Yes", "", "",
"Yes", "Yes", "Yes", "Yes",
"Yes", "", "Yes", "", "",
"", "", "", "", "",
"Yes", "Yes", "",
"", ""), Native.Hawaiian.or.other.Pacif..RACE4. = c(NA, NA,
NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA), White..RACE5. = c("", "", "Yes",
"", "", "", "",
"", "Yes", "", "Yes", "Yes",
"Yes", "Yes", "Yes", "Yes",
"Yes",
"", "", "Yes", "Yes", "Yes"),
Other.Race..RACE6. = c(NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA), Specify.Other.Race..RACEOTH. = c(NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA)), .Names = c("PLTID", "Indian..RACE1.",
"Asian..RACE2.",
"Black..RACE3.", "Native.Hawaiian.or.other.Pacif..RACE4.",
"White..RACE5.",
"Other.Race..RACE6.", "Specify.Other.Race..RACEOTH."), class
= "data.frame", row.names = 43:64)

I would like to add a column that indicates which of the other columns contains
?Yes?.  In other words, that column would contain:
	Black..RACE3.
	Asian..RACE2.
	White..RACE5.
	Black..RACE3.
	?

Even better would be
	Black
	Asian
	White
	Black
	?
(which I can accomplish with strsplit)

None of the rows contains more than one ?Yes? although it is possible that none
of the entries in a row would be ?Yes? (in which case, the entry in the new
column should be NA)

I could do this by looping through each of the columns with something like this:
	DATA$RACE	 	<- NA
	for (COL in 2:8)	DATA$RACE[which(DATA[,COL] == "Yes")]	<-
names(DATA)[COL]
But, I suspect that there is some more elegant way to accomplish this.

Thanks in advance.

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

Jorge I Velez

2014-Nov-01 02:27 UTC

head link

[R] Creating a new column from a series of columns

Dear Dennis,

Assuming that your data.frame() is called dd, the following should get you
started:

colnames(dd[,-1])[apply(dd[,-1], 1, function(x) which(x == 'Yes'))]

HTH,
Jorge.-


On Sat, Nov 1, 2014 at 12:32 PM, Fisher Dennis <fisher at plessthan.com>
wrote:
> R 3.1.1
> OS X
>
> Colleagues,
> I have a dataset containing multiple columns indicating race for subjects
> in a clinical trial.  A subset of the data (obtained with dput) is shown
> here:
>
> structure(list(PLTID = c(7157, 8138, 8150, 9112, 9114, 9115,
> 9124, 9133, 9141, 9144, 9148, 12110, 12111, 12116, 12134, 12136,
> 12137, 12142, 12143, 12146, 12147, 13159), Indian..RACE1. = c(NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA), Asian..RACE2. = c("", "Yes",
"", "", "",
> "", "", "", "", "",
"", "", "", "", "",
"", "", "", "", "",
"",
> ""), Black..RACE3. = c("Yes", "",
"", "Yes", "Yes", "Yes",
"Yes",
> "Yes", "", "Yes", "", "",
"", "", "", "", "",
"Yes", "Yes", "",
> "", ""), Native.Hawaiian.or.other.Pacif..RACE4. = c(NA,
NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA), White..RACE5. = c("", "", "Yes",
"", "", "", "",
> "", "Yes", "", "Yes",
"Yes", "Yes", "Yes", "Yes",
"Yes", "Yes",
> "", "", "Yes", "Yes",
"Yes"), Other.Race..RACE6. = c(NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA), Specify.Other.Race..RACEOTH. = c(NA, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA)), .Names = c("PLTID", "Indian..RACE1.",
"Asian..RACE2.",
> "Black..RACE3.",
"Native.Hawaiian.or.other.Pacif..RACE4.", "White..RACE5.",
> "Other.Race..RACE6.", "Specify.Other.Race..RACEOTH."),
class > "data.frame", row.names = 43:64)
>
> I would like to add a column that indicates which of the other columns
> contains "Yes".  In other words, that column would contain:
>         Black..RACE3.
>         Asian..RACE2.
>         White..RACE5.
>         Black..RACE3.
>         ...
>
> Even better would be
>         Black
>         Asian
>         White
>         Black
>         ...
> (which I can accomplish with strsplit)
>
> None of the rows contains more than one 'Yes' although it is
possible that
> none of the entries in a row would be 'Yes' (in which case, the
entry in
> the new column should be NA)
>
> I could do this by looping through each of the columns with something like
> this:
>         DATA$RACE               <- NA
>         for (COL in 2:8)        DATA$RACE[which(DATA[,COL] ==
"Yes")]   <-
> names(DATA)[COL]
> But, I suspect that there is some more elegant way to accomplish this.
>
> Thanks in advance.
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jeff Newmiller

2014-Nov-01 03:13 UTC

head link

[R] Creating a new column from a series of columns

This method handles cases where multiple columns are "Yes".

library(reshape2)
ddl <- melt( dd, id.vars = "PLTID" )
ddl[ is.na( ddl$value ), "value" ] <- ""
ddl <- ddl[ "Yes" == ddl$value, ]
result <- merge( dd[ , "PLTID", drop=FALSE ]
                , ddl[ , c( "PLTID", "variable",
"value" ) ]
                     , all.x=TRUE
                )

On Fri, 31 Oct 2014, Fisher Dennis wrote:
> R 3.1.1
> OS X
>
> Colleagues,
> I have a dataset containing multiple columns indicating race for subjects
in a clinical trial.  A subset of the data (obtained with dput) is shown here:
>
> structure(list(PLTID = c(7157, 8138, 8150, 9112, 9114, 9115,
> 9124, 9133, 9141, 9144, 9148, 12110, 12111, 12116, 12134, 12136,
> 12137, 12142, 12143, 12146, 12147, 13159), Indian..RACE1. = c(NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA, NA), Asian..RACE2. = c("", "Yes",
"", "", "",
> "", "", "", "", "",
"", "", "", "", "",
"", "", "", "", "",
"",
> ""), Black..RACE3. = c("Yes", "",
"", "Yes", "Yes", "Yes",
"Yes",
> "Yes", "", "Yes", "", "",
"", "", "", "", "",
"Yes", "Yes", "",
> "", ""), Native.Hawaiian.or.other.Pacif..RACE4. = c(NA,
NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA), White..RACE5. = c("", "", "Yes",
"", "", "", "",
> "", "Yes", "", "Yes",
"Yes", "Yes", "Yes", "Yes",
"Yes", "Yes",
> "", "", "Yes", "Yes",
"Yes"), Other.Race..RACE6. = c(NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA, NA), Specify.Other.Race..RACEOTH. = c(NA, NA, NA,
> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
> NA, NA, NA)), .Names = c("PLTID", "Indian..RACE1.",
"Asian..RACE2.",
> "Black..RACE3.",
"Native.Hawaiian.or.other.Pacif..RACE4.", "White..RACE5.",
> "Other.Race..RACE6.", "Specify.Other.Race..RACEOTH."),
class = "data.frame", row.names = 43:64)
>
> I would like to add a column that indicates which of the other columns
contains ?Yes?.  In other words, that column would contain:
> 	Black..RACE3.
> 	Asian..RACE2.
> 	White..RACE5.
> 	Black..RACE3.
> 	?
>
> Even better would be
> 	Black
> 	Asian
> 	White
> 	Black
> 	?
> (which I can accomplish with strsplit)
>
> None of the rows contains more than one ?Yes? although it is possible that
none of the entries in a row would be ?Yes? (in which case, the entry in the new
column should be NA)
>
> I could do this by looping through each of the columns with something like
this:
> 	DATA$RACE	 	<- NA
> 	for (COL in 2:8)	DATA$RACE[which(DATA[,COL] == "Yes")]	<-
names(DATA)[COL]
> But, I suspect that there is some more elegant way to accomplish this.
>
> Thanks in advance.
>
> Dennis
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                       Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k

R help - Nov 2014 - Creating a new column from a series of columns

[R] Creating a new column from a series of columns

[R] Creating a new column from a series of columns

[R] Creating a new column from a series of columns