Hello All, Can anyone tell help me understand why the function below doesn't work and how I can fix it? Below are some sample data, some code that works on individual rows of the data, and my attempt to translate that code into a function. My hope is to get the function working and then to apply it to the larger data frame using ddply() from the plyr package or possibly some other approach. As yet, I don't have much experience writing anonymous functions. I imagine I'm doing something that is obviously wrong, but I don't know what it is. Thanks, Paul #### Read in test data #### testData <- structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 ", "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 " ), class = "factor"), encounter_date = structure(c(9L, 10L, 11L, 12L, 13L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c(" 2009-03-01 ", " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-10-15 ", " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ", " 2011-10-10 ", " 2011-10-24 ", " 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 " ), class = "factor"), raw = c(" ordered kras testing on 10102010 results not yet available if patient has a mutation will start erbitux ", " received kras results on 10202010 test results indicate tumor is wild type ua protein positve erpr positive her2neu positve ", " will conduct kras mutation testing prior to initiation of therapy with erbitux ", " still need to order kras mutation testing ", " ordered kras testing waiting for results ", " kras test results pending note that patient was negative for lynch mutation ", " kras results still pending note that patient was negative for lynch mutation ", " kras mutated will not prescribe erbitux due to mutation ", " kras mutated therefore did not prescribe erbitux ", " kras wild ", " tumor is negative for mutation ", " tumor is wild type patient is eligible to receive eribtux ", " if patient kras result is wild type they will start erbitux several lines of material ordered kras mutation test 11112011 results are still not available ", " kras results are in patient has the mutation ", " ordered kras mutation testing on 02152011 results came back negative several lines of material patient kras mutation test is negative will start erbitux ", " patient is kras negative started erbitux on 03012011 ")), .Names = c("profile_key", "encounter_date", "raw"), row.names = c(NA, -16L), class = "data.frame") #### Convert text record to lowercase #### testData$raw <- tolower(testData$raw) #### Remove punctuation and any multiple spaces #### testData$raw <- gsub("[[:punct:]]", "", testData$raw) testData$raw <- gsub(" +", " ", testData$raw) #### Select test row #### testRow <- testData[13,] testRow #### Select terms +/- a specified number of words from "kras" #### Text <- unlist(strsplit(testRow$raw, " ")) Target <- grep("kras", Text) if (length(Target) == 0) {testRow$reduced <- ""} else{ Length <- length(Text) Keep <- rep(NA, Length) Lower <- ifelse(Target - 6 > 0, Target - 6, 1) Upper <- ifelse(Target + 6 < Length, Target + 6, Length) for(i in 1:length(Keep)){ for(j in 1:length(Lower)){ Keep[i][i %in% seq(Lower[j], Upper[j])] <- i }} testRow$reduced <- paste(Text[!is.na(Keep)], collapse=" ") } testRow length(Text) length(Text[!is.na(Keep)]) #### Function for selecting words within specified range of a target term #### nearTerms <- function(df, text, target, before, after, outvar){ Text <- with(df, strsplit(text, " ")) Target <- grep(target, Text) if (length(Target) == 0) {df$reduced <- ""} else{ Length <- length(Text) Keep <- rep(NA, Length) Lower <- ifelse(Target - before > 0, Target - before, 1) Upper <- ifelse(Target + after < Length, Target + after, Length) for(i in 1:length(Keep)){ for(j in 1:length(Lower)){ Keep[i][i %in% seq(Lower[j], Upper[j])] <- i }} df <- transform(df, outvar = paste(Text[!is.na(Keep)], collapse=" ")) } } nearTerms(testRow, raw, "kras", 6, 6) nearTerms(df = testRow, text = raw, target = "kras", before = 6, after = 6)
Well, good luck finding someone to wade through your code -- "small,reproducible" examples are requested for a reason -- but I will offer that I have no idea what you mean with your remark about anonymous functions, as the code you posted has none. -- Bert On Thu, May 31, 2012 at 10:38 AM, Paul Miller <pjmiller_57 at yahoo.com> wrote:> Hello All, > > Can anyone tell help me understand why the function below doesn't work and how I can fix it? Below are some sample data, some code that works on individual rows of the data, and my attempt to translate that code into a function. My hope is to get the function working and then to apply it to the larger data frame using ddply() from the plyr package or possibly some other approach. > > As yet, I don't have much experience writing anonymous functions. I imagine I'm doing something that is obviously wrong, but I don't know what it is. > > Thanks, > > Paul > > #### Read in test data #### > > testData <- > structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L, > 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 ", > "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 " > ), class = "factor"), encounter_date = structure(c(9L, 10L, 11L, > 12L, 13L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c(" 2009-03-01 ", > " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-10-15 ", > " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ", " 2011-10-10 ", > " 2011-10-24 ", " 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 " > ), class = "factor"), raw = c(" ordered kras testing on 10102010 results not yet available if patient has a mutation will start erbitux ", > " received kras results on 10202010 test results indicate tumor is wild type ua protein positve erpr positive her2neu positve ", > " will conduct kras mutation testing prior to initiation of therapy with erbitux ", > " still need to order kras mutation testing ", " ordered kras testing waiting for results ", > " kras test results pending note that patient was negative for lynch mutation ", > " kras results still pending note that patient was negative for lynch mutation ", > " kras mutated will not prescribe erbitux due to mutation ", > " kras mutated therefore did not prescribe erbitux ", " kras wild ", > " tumor is negative for mutation ", " tumor is wild type patient is eligible to receive eribtux ", > " if patient kras result is wild type they will start erbitux several lines of material ordered kras mutation test 11112011 results are still not available ", > " kras results are in patient has the mutation ", " ordered kras mutation testing on 02152011 results came back negative several lines of material patient kras mutation test is negative will start erbitux ", > " patient is kras negative started erbitux on 03012011 ")), .Names = c("profile_key", > "encounter_date", "raw"), row.names = c(NA, -16L), class = "data.frame") > > #### Convert text record to lowercase #### > > testData$raw <- tolower(testData$raw) > > #### Remove punctuation and any multiple spaces #### > > testData$raw <- gsub("[[:punct:]]", "", testData$raw) > testData$raw <- gsub(" +", " ", testData$raw) > > #### Select test row #### > > testRow <- testData[13,] > testRow > > #### Select terms +/- a specified number of words from "kras" #### > > Text <- unlist(strsplit(testRow$raw, " ")) > Target <- grep("kras", Text) > > if (length(Target) == 0) {testRow$reduced <- ""} else{ > > Length <- length(Text) > Keep <- rep(NA, Length) > Lower <- ifelse(Target - 6 > 0, Target - 6, 1) > Upper <- ifelse(Target + 6 < Length, Target + 6, Length) > > for(i in 1:length(Keep)){ > for(j in 1:length(Lower)){ > ? ? ? ?Keep[i][i %in% seq(Lower[j], Upper[j])] <- i > }} > > testRow$reduced <- paste(Text[!is.na(Keep)], collapse=" ") > > } > > testRow > > length(Text) > length(Text[!is.na(Keep)]) > > #### Function for selecting words within specified range of a target term #### > > nearTerms <- function(df, text, target, before, after, outvar){ > > ? Text <- with(df, strsplit(text, " ")) > ? Target <- grep(target, Text) > > ? if (length(Target) == 0) {df$reduced <- ""} else{ > > ? Length <- length(Text) > ? Keep <- rep(NA, Length) > ? Lower <- ifelse(Target - before > 0, Target - before, 1) > ? Upper <- ifelse(Target + after < Length, Target + after, Length) > > ? for(i in 1:length(Keep)){ > ? for(j in 1:length(Lower)){ > ? ? ?Keep[i][i %in% seq(Lower[j], Upper[j])] <- i > ? }} > > ? df <- transform(df, outvar = paste(Text[!is.na(Keep)], collapse=" ")) > > ? } > > } > > nearTerms(testRow, raw, "kras", 6, 6) > > nearTerms(df = testRow, text = raw, target = "kras", before = 6, after = 6) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Hello Bert and Sarah, Thank you for your replies. Helped me understand how people might perceive my question and why they might not respond. Spent some time learning about R's debugging tools this morning. Began to realize why my function didn't work. My second argument was the name of a variable. What I didn't realize is that R would immediately expect this to be a previously defined object. I had thought that passing the name of the variable to the body of the function would generate a correct line of code, and that this was all that was required to get the function to work. Below is a function that does work, at least when applied to a single row of data. I had previously been reading about the Split-Apply-Combine? strategy in a paper about the plyr package. The paper advocates coming up with a function that works for a subset of one's data and then using plyr to split up the data and apply the function to each of the subsets. Was under the impression that this last part would be easy. Seems not to be the case though. So on to the next part. Thanks again for your feedback. Paul #### Test row #### testRow <- structure(list(profile_key = structure(6L, .Label = c("001-001 ", "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", "001-007 " ), class = "factor"), encounter_date = structure(4L, .Label = c(" 2009-03-01 ", " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-04-01 ", " 2010-10-15 ", " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ", " 2011-04-01 ", " 2011-10-10 ", " 2011-10-24 ", " 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 "), class = "factor"), raw = " if patient kras result is wild type they will start erbitux several lines of material ordered kras mutation test 11112011 results are still not available "), .Names = c("profile_key", "encounter_date", "raw"), row.names = 13L, class = "data.frame") testRow #### Function for selecting words within specified range of a target term #### nearTerms <- function(df, rawtext, target, before, after, reduced){ Text <- unlist(strsplit(df[,rawtext], " ")) Target <- grep(target, Text) if (length(Target) == 0) {df <- transform(df, outtext = "")} else{ Length <- length(Text) Keep <- rep(NA, Length) Lower <- ifelse(Target - before > 0, Target - before, 1) Upper <- ifelse(Target + after < Length, Target + after, Length) for(i in 1:length(Keep)){ for(j in 1:length(Lower)){ Keep[i][i %in% seq(Lower[j], Upper[j])] <- i }} df <- transform(df, outtext = paste(Text[!is.na(Keep)], collapse=" ")) } names(df)[names(df) == "outtext"] <- reduced df <- df } testRow <- nearTerms(df = testRow, rawtext = "raw", target = "kras", before = 6, after = 6, reduced = "reduced") testRow
Hi Sarah, That I was making things too complicated doesn't surprise me. A skilled programmer makes everything look easy I think. And someone who is still learning does just the opposite. Am going to spend some time now looking through your tweaks. Thank you very much for your help. Paul
Reasonably Related Threads
- Complex text parsing task
- Finding words that are within +/- X words of "KRAS" using tm package or other means
- Text mining? Text manipulation? Both? Predicting KRAS test results in cancer patients
- Adding text for written comments to bottom of graphs
- Plotting patient drug timelines using ggplot2 (or some other means) -- Help!!!