thr3ads.net - R help - [R] How can I get this function to work? [May 2012]

If this information is useful, please help other people find it:
Share via:

Paul Miller

2012-May-31 17:38 UTC

[R] How can I get this function to work?

Hello All,

Can anyone tell help me understand why the function below doesn't work and
how I can fix it? Below are some sample data, some code that works on individual
rows of the data, and my attempt to translate that code into a function. My hope
is to get the function working and then to apply it to the larger data frame
using ddply() from the plyr package or possibly some other approach.

As yet, I don't have much experience writing anonymous functions. I imagine
I'm doing something that is obviously wrong, but I don't know what it
is.

Thanks,

Paul

#### Read in test data ####

testData <-
structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L, 
3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 ", 
"001-002 ", "001-003 ", "001-004 ", "001-005
", "001-006 ", "001-007 "
), class = "factor"), encounter_date = structure(c(9L, 10L, 11L, 
12L, 13L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c("
2009-03-01 ",
" 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ",
" 2010-10-15 ",
" 2010-11-15 ", " 2011-03-01 ", " 2011-03-14 ",
" 2011-10-10 ",
" 2011-10-24 ", " 2012-09-15 ", " 2012-10-05 ",
" 2012-10-17 "
), class = "factor"), raw = c(" ordered kras testing on 10102010
results not yet available if patient has a mutation will start erbitux ",
" received kras results on 10202010 test results indicate tumor is wild
type ua protein positve erpr positive her2neu positve ",
" will conduct kras mutation testing prior to initiation of therapy with
erbitux ",
" still need to order kras mutation testing ", " ordered kras
testing waiting for results ",
" kras test results pending note that patient was negative for lynch
mutation ",
" kras results still pending note that patient was negative for lynch
mutation ",
" kras mutated will not prescribe erbitux due to mutation ", 
" kras mutated therefore did not prescribe erbitux ", " kras wild
",
" tumor is negative for mutation ", " tumor is wild type patient
is eligible to receive eribtux ",
" if patient kras result is wild type they will start erbitux several lines
of material ordered kras mutation test 11112011 results are still not available
",
" kras results are in patient has the mutation ", " ordered kras
mutation testing on 02152011 results came back negative several lines of
material patient kras mutation test is negative will start erbitux ",
" patient is kras negative started erbitux on 03012011 ")), .Names =
c("profile_key",
"encounter_date", "raw"), row.names = c(NA, -16L), class =
"data.frame")

#### Convert text record to lowercase ####

testData$raw <- tolower(testData$raw)

#### Remove punctuation and any multiple spaces ####

testData$raw <- gsub("[[:punct:]]", "", testData$raw)
testData$raw <- gsub(" +", " ", testData$raw)

#### Select test row ####

testRow <- testData[13,]
testRow

#### Select terms +/- a specified number of words from "kras" ####

Text <- unlist(strsplit(testRow$raw, " ")) 
Target <- grep("kras", Text)

if (length(Target) == 0) {testRow$reduced <- ""} else{ 

Length <- length(Text)
Keep <- rep(NA, Length)
Lower <- ifelse(Target - 6 > 0, Target - 6, 1)
Upper <- ifelse(Target + 6 < Length, Target + 6, Length)

for(i in 1:length(Keep)){
for(j in 1:length(Lower)){
	Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
}}

testRow$reduced <- paste(Text[!is.na(Keep)], collapse=" ")

}

testRow

length(Text)
length(Text[!is.na(Keep)])

#### Function for selecting words within specified range of a target term ####

nearTerms <- function(df, text, target, before, after, outvar){

   Text <- with(df, strsplit(text, " ")) 
   Target <- grep(target, Text)

   if (length(Target) == 0) {df$reduced <- ""} else{ 

   Length <- length(Text)
   Keep <- rep(NA, Length)
   Lower <- ifelse(Target - before > 0, Target - before, 1)
   Upper <- ifelse(Target + after < Length, Target + after, Length)

   for(i in 1:length(Keep)){
   for(j in 1:length(Lower)){
      Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
   }}

   df <- transform(df, outvar = paste(Text[!is.na(Keep)], collapse="
"))

   }

}

nearTerms(testRow, raw, "kras", 6, 6)

nearTerms(df = testRow, text = raw, target = "kras", before = 6, after
= 6)

Bert Gunter

2012-May-31 17:54 UTC

head link

[R] How can I get this function to work?

Well, good luck finding someone to wade through your code --
"small,reproducible" examples are requested for a reason -- but I will
offer that I have no idea what you mean with your remark about
anonymous functions, as the code you posted has none.

-- Bert

On Thu, May 31, 2012 at 10:38 AM, Paul Miller <pjmiller_57 at yahoo.com>
wrote:> Hello All,
>
> Can anyone tell help me understand why the function below doesn't work
and how I can fix it? Below are some sample data, some code that works on
individual rows of the data, and my attempt to translate that code into a
function. My hope is to get the function working and then to apply it to the
larger data frame using ddply() from the plyr package or possibly some other
approach.
>
> As yet, I don't have much experience writing anonymous functions. I
imagine I'm doing something that is obviously wrong, but I don't know
what it is.
>
> Thanks,
>
> Paul
>
> #### Read in test data ####
>
> testData <-
> structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L,
> 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001
",
> "001-002 ", "001-003 ", "001-004 ",
"001-005 ", "001-006 ", "001-007 "
> ), class = "factor"), encounter_date = structure(c(9L, 10L, 11L,
> 12L, 13L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c("
2009-03-01 ",
> " 2009-03-22 ", " 2009-04-01 ", " 2010-03-01
", " 2010-10-15 ",
> " 2010-11-15 ", " 2011-03-01 ", " 2011-03-14
", " 2011-10-10 ",
> " 2011-10-24 ", " 2012-09-15 ", " 2012-10-05
", " 2012-10-17 "
> ), class = "factor"), raw = c(" ordered kras testing on
10102010 results not yet available if patient has a mutation will start erbitux
",
> " received kras results on 10202010 test results indicate tumor is
wild type ua protein positve erpr positive her2neu positve ",
> " will conduct kras mutation testing prior to initiation of therapy
with erbitux ",
> " still need to order kras mutation testing ", " ordered
kras testing waiting for results ",
> " kras test results pending note that patient was negative for lynch
mutation ",
> " kras results still pending note that patient was negative for lynch
mutation ",
> " kras mutated will not prescribe erbitux due to mutation ",
> " kras mutated therefore did not prescribe erbitux ", " kras
wild ",
> " tumor is negative for mutation ", " tumor is wild type
patient is eligible to receive eribtux ",
> " if patient kras result is wild type they will start erbitux several
lines of material ordered kras mutation test 11112011 results are still not
available ",
> " kras results are in patient has the mutation ", " ordered
kras mutation testing on 02152011 results came back negative several lines of
material patient kras mutation test is negative will start erbitux ",
> " patient is kras negative started erbitux on 03012011 ")),
.Names = c("profile_key",
> "encounter_date", "raw"), row.names = c(NA, -16L),
class = "data.frame")
>
> #### Convert text record to lowercase ####
>
> testData$raw <- tolower(testData$raw)
>
> #### Remove punctuation and any multiple spaces ####
>
> testData$raw <- gsub("[[:punct:]]", "",
testData$raw)
> testData$raw <- gsub(" +", " ", testData$raw)
>
> #### Select test row ####
>
> testRow <- testData[13,]
> testRow
>
> #### Select terms +/- a specified number of words from "kras"
####
>
> Text <- unlist(strsplit(testRow$raw, " "))
> Target <- grep("kras", Text)
>
> if (length(Target) == 0) {testRow$reduced <- ""} else{
>
> Length <- length(Text)
> Keep <- rep(NA, Length)
> Lower <- ifelse(Target - 6 > 0, Target - 6, 1)
> Upper <- ifelse(Target + 6 < Length, Target + 6, Length)
>
> for(i in 1:length(Keep)){
> for(j in 1:length(Lower)){
> ? ? ? ?Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
> }}
>
> testRow$reduced <- paste(Text[!is.na(Keep)], collapse=" ")
>
> }
>
> testRow
>
> length(Text)
> length(Text[!is.na(Keep)])
>
> #### Function for selecting words within specified range of a target term
####
>
> nearTerms <- function(df, text, target, before, after, outvar){
>
> ? Text <- with(df, strsplit(text, " "))
> ? Target <- grep(target, Text)
>
> ? if (length(Target) == 0) {df$reduced <- ""} else{
>
> ? Length <- length(Text)
> ? Keep <- rep(NA, Length)
> ? Lower <- ifelse(Target - before > 0, Target - before, 1)
> ? Upper <- ifelse(Target + after < Length, Target + after, Length)
>
> ? for(i in 1:length(Keep)){
> ? for(j in 1:length(Lower)){
> ? ? ?Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
> ? }}
>
> ? df <- transform(df, outvar = paste(Text[!is.na(Keep)], collapse="
"))
>
> ? }
>
> }
>
> nearTerms(testRow, raw, "kras", 6, 6)
>
> nearTerms(df = testRow, text = raw, target = "kras", before = 6,
after = 6)
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.


-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

Paul Miller

2012-Jun-01 18:51 UTC

head link

[R] How can I get this function to work?

Hello Bert and Sarah,

Thank you for your replies. Helped me understand how people might perceive my
question and why they might not respond.

Spent some time learning about R's debugging tools this morning. Began to
realize why my function didn't work. My second argument was the name of a
variable. What I didn't realize is that R would immediately expect this to
be a previously defined object. I had thought that passing the name of the
variable to the body of the function would generate a correct line of code, and
that this was all that was required to get the function to work.

Below is a function that does work, at least when applied to a single row of
data. I had previously been reading about the Split-Apply-Combine? strategy in a
paper about the plyr package. The paper advocates coming up with a function that
works for a subset of one's data and then using plyr to split up the data
and apply the function to each of the subsets. Was under the impression that
this last part would be easy. Seems not to be the case though.

So on to the next part.

Thanks again for your feedback. 

Paul


#### Test row ####

testRow <-
structure(list(profile_key = structure(6L, .Label = c("001-001 ", 
"001-002 ", "001-003 ", "001-004 ", "001-005
", "001-006 ", "001-007 "
), class = "factor"), encounter_date = structure(4L, .Label = c("
2009-03-01 ",
" 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ",
" 2010-04-01 ",
" 2010-10-15 ", " 2010-11-15 ", " 2011-03-01 ",
" 2011-03-14 ",
" 2011-04-01 ", " 2011-10-10 ", " 2011-10-24 ",
" 2012-09-15 ",
" 2012-10-05 ", " 2012-10-17 "), class =
"factor"), raw = " if patient kras result is wild type they will
start erbitux several lines of material ordered kras mutation test 11112011
results are still not available "), .Names = c("profile_key",
"encounter_date", "raw"), row.names = 13L, class =
"data.frame")

testRow

#### Function for selecting words within specified range of a target term ####

nearTerms <- function(df, rawtext, target, before, after, reduced){ 
   Text <- unlist(strsplit(df[,rawtext], " "))
   Target <- grep(target, Text)

   if (length(Target) == 0) {df <- transform(df, outtext = "")}
else{

   Length <- length(Text)
   Keep <- rep(NA, Length)
   Lower <- ifelse(Target - before > 0, Target - before, 1)
   Upper <- ifelse(Target + after < Length, Target + after, Length)

   for(i in 1:length(Keep)){
   for(j in 1:length(Lower)){
      Keep[i][i %in% seq(Lower[j], Upper[j])] <- i
   }}

   df <- transform(df, outtext = paste(Text[!is.na(Keep)], collapse="
"))

   }

   names(df)[names(df) == "outtext"] <- reduced
   df <- df
}

testRow <- nearTerms(df = testRow, rawtext = "raw", target =
"kras", before = 6, after = 6, reduced = "reduced")
testRow

Paul Miller

2012-Jun-01 19:41 UTC

head link

[R] How can I get this function to work?

Hi Sarah,

That I was making things too complicated doesn't surprise me. A skilled
programmer makes everything look easy I think. And someone who is still learning
does just the opposite.

Am going to spend some time now looking through your tweaks.

Thank you very much for your help.

Paul

Reasonably Related Threads

Search for more maybe matching threads

R help - May 2012 - How can I get this function to work?

[R] How can I get this function to work?

[R] How can I get this function to work?

[R] How can I get this function to work?

[R] How can I get this function to work?

Reasonably Related Threads