I'm trying to do the following: search each patient's list of diagnoses for a specific code then create a new column based upon the the presence of the specific code. Simplified data follows: con <- textConnection(" ID DX1 DX2 DX3 1 4109 4280 7102 2 734 311 490 3 4011 42822 4101 ") df <- read.table(con, header = TRUE, strip.white = TRUE, colClasses="character") # # I would like to add a column such the result of searching for 410 would give: The search string would always be at the start of a word and doesn't need regex. # # ID DX1 DX2 DX3 htn # 1 4109 4280 7102 1 # 2 734 311 490 0 # 3 4011 42822 4101 1 # # The following works but is slow and returns NA if the search string is not found: for (i in 1:nrow(df)) { df[i,"htn"] <- any(sapply('410', function(x) which( grepl(x, df[i, 2:4], fixed = TRUE) ))) } Thanks in advance. I never fail to learn new things from this list. -- Who is wise? One who learns from every person. Who is strong? One who overpowers his evil inclinations. Who is rich? One who is satisfied with his lot. Who is honorable? One who honors his fellows. - Pirkei Avot [excerpt]
This faster than your version, and doesn't return NA: df$htn <- apply(df[,2:4], 1, function(x)any(grepl("^410", x)))> dfID DX1 DX2 DX3 htn 1 1 4109 4280 7102 TRUE 2 2 734 311 490 FALSE 3 3 4011 42822 4101 TRUE> system.time({+ for(j in 1:10000) { + for (i in 1:nrow(df)) { + df[i,"htn"] <- any(sapply('410', function(x) which( grepl(x, df[i, 2:4], fixed = TRUE) ))) + } + } + }) user system elapsed 6.648 0.008 6.657 There were 50 or more warnings (use warnings() to see the first 50)> > > > system.time({+ for(j in 1:10000) { + df$htn <- apply(df[,2:4], 1, function(x)any(grepl("^410", x))) + } + }) user system elapsed 1.826 0.000 1.826 On Mon, Jun 15, 2015 at 4:12 PM, Federman, Douglas <Douglas.Federman at utoledo.edu> wrote:> I'm trying to do the following: search each patient's list of diagnoses for a specific code then create a new column based upon the the presence of the specific code. > Simplified data follows: > > con <- textConnection(" > ID DX1 DX2 DX3 > 1 4109 4280 7102 > 2 734 311 490 > 3 4011 42822 4101 > ") > df <- read.table(con, header = TRUE, strip.white = TRUE, colClasses="character") > # > # I would like to add a column such the result of searching for 410 would give: The search string would always be at the start of a word and doesn't need regex. > # > # ID DX1 DX2 DX3 htn > # 1 4109 4280 7102 1 > # 2 734 311 490 0 > # 3 4011 42822 4101 1 > # > # The following works but is slow and returns NA if the search string is not found: > > for (i in 1:nrow(df)) { > df[i,"htn"] <- any(sapply('410', function(x) which( grepl(x, df[i, 2:4], fixed = TRUE) ))) > } > > Thanks in advance. I never fail to learn new things from this list. >-- Sarah Goslee http://www.functionaldiversity.org
On Jun 15, 2015, at 1:12 PM, Federman, Douglas wrote:> I'm trying to do the following: search each patient's list of diagnoses for a specific code then create a new column based upon the the presence of the specific code. > Simplified data follows: > > con <- textConnection(" > ID DX1 DX2 DX3 > 1 4109 4280 7102 > 2 734 311 490 > 3 4011 42822 4101 > ") > df <- read.table(con, header = TRUE, strip.white = TRUE, colClasses="character") > # > # I would like to add a column such the result of searching for 410 would give: The search string would always be at the start of a word and doesn't need regex. > # > # ID DX1 DX2 DX3 htn > # 1 4109 4280 7102 1 > # 2 734 311 490 0 > # 3 4011 42822 4101 1 > # > # The following works but is slow and returns NA if the search string is not found: > > for (i in 1:nrow(df)) { > df[i,"htn"] <- any(sapply('410', function(x) which( grepl(x, df[i, 2:4], fixed = TRUE) ))) > }Is this any better?> df$htn <- apply(df[-1], 1, function(r) max( substr(r, 1,3) == "410" )) > dfID DX1 DX2 DX3 htn 1 1 4109 4280 7102 1 2 2 734 311 490 0 3 3 4011 42822 4101 1 Can add an na.rm=TRUE to the max call if warranted. `max` coerces logicals to integer. -- David Winsemius Alameda, CA, USA