Jason Stout, M.D.
2013-Apr-25 01:22 UTC
[R] Assigning a variable value based on multiple columns
Hi All, I'm hoping someone can help me with a relatively simple problem. Take the following dataset: ID Diabetes ESRD HIV Contact 1 0 0 NA 0 2 1 0 NA 0 3 NA 1 0 0 4 0 NA 0 1 5 1 1 1 0 I want to generate a column called TSTcutoff based on the values in the row. TSTcutoff would be the lower of 15 (if Diabetes=ESRD=HIV=Contact=0), 10 (if Diabetes or ESRD=1 AND HIV=Contact=0), or 5 (if HIV OR Contact=1). I was thinking this could be done with a series of IFELSE statements, but the NA values make this more challenging. I want to ignore NA values when calculating TSTcutoff. So the final dataset should look like this: ID Diabetes ESRD HIV Contact TSTcutoff 1 0 0 NA 0 15 2 1 0 NA 0 10 3 NA 1 0 0 10 4 0 NA 0 1 5 5 1 1 1 0 5 Thanks for any suggestions. Jason Stout, MD, MHS Box 102359-DUMC Durham, NC 27710 FAX 919-681-7494 [[alternative HTML version deleted]]
Patrick Coulombe
2013-Apr-25 05:53 UTC
[R] Assigning a variable value based on multiple columns
Hi Jason, I think that the easiest for you would be to keep your current elseif statements as is, but change your NA into something else (e.g., -999, or anything else). To do this in one line, you can use the package "gdata". In this code, I assume that your data are stored in the variable "dataset": ########### #install package gdata if not yet installed install.packages("gdata") #load package gdata library(gdata) #change NA into -999 dataset <- NAToUnknown(dataset, -999) #do your ifs/ifelses here... #... #... #change -999 back into NA dataset <- unknownToNA(dataset, -999) ############ And that should do it. Hope this helps, Patrick 2013/4/24 Jason Stout, M.D. <jason.stout at duke.edu>> > Hi All, > > I'm hoping someone can help me with a relatively simple problem. Take the following dataset: > > ID Diabetes ESRD HIV Contact > 1 0 0 NA 0 > 2 1 0 NA 0 > 3 NA 1 0 0 > 4 0 NA 0 1 > 5 1 1 1 0 > > I want to generate a column called TSTcutoff based on the values in the row. TSTcutoff would be the lower of 15 (if Diabetes=ESRD=HIV=Contact=0), 10 (if Diabetes or ESRD=1 AND HIV=Contact=0), or 5 (if HIV OR Contact=1). I was thinking this could be done with a series of IFELSE statements, but the NA values make this more challenging. I want to ignore NA values when calculating TSTcutoff. So the final dataset should look like this: > > ID Diabetes ESRD HIV Contact TSTcutoff > 1 0 0 NA 0 15 > 2 1 0 NA 0 10 > 3 NA 1 0 0 10 > 4 0 NA 0 1 5 > 5 1 1 1 0 5 > > Thanks for any suggestions. > > Jason Stout, MD, MHS > Box 102359-DUMC > Durham, NC 27710 > FAX 919-681-7494 > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.