thr3ads.net - R help - [R] Text mining? Text manipulation? Both? Predicting KRAS test results in cancer patients [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Paul Miller

2012-Sep-28 19:57 UTC

[R] Text mining? Text manipulation? Both? Predicting KRAS test results in cancer patients

Happy Friday Everyone,
?
Hope Friday afternoon doesn't turn out to be a terrible time to post a
question. I've been doing a little data mining of patient text medical
records as of late. I started out trying to predict whether or not cancer
patients had received KRAS mutation testing and did quite well with that. Now
I'm trying to predict the results of KRAS testing (mutated?vs. wild type).
This is proving to be a little more difficult.
?
With the first classification task, I created counts of terms (e.g.,
""kras", "mutated")?in the text medical records using
the tm package and then used those counts to predict whether or not patients had
had KRAS mutation testing. I tried a few different analyses here, but found that
random forests worked the best.
?
Predicting the results of testing is harder though because of the way physicians
and other healthcare professionals write about testing. For example, I'm
finding phrases like "KRAS mutation returned wild-type". In this
example, if we're counting, we get 1 instance of "kras", 1
instance?of "mutated", and one instance of "wild". So you
can see how it might be difficult to accurately predict the results of testing
based on counts alone.
?
My question is how best to deal with this. Are there any R text mining packages
or related software that would be particularly suited to my problem? I took a
look at the CRAN Task View: Natural Language Processing and there were so many
options I didn't really know where to start (and it's not even clear
that an R-based solution will work best for my problem). Alternatively, is there
any real chance one could simply write code that would be able to identify true
references to the results of KRAS testing and then create counts only of what
are likely to be true references?
?
I'd greatly appreciate it if someone could point me in the right direction.
?
Thanks,
?
Paul?
?
?

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Sep 2012 - Text mining? Text manipulation? Both? Predicting KRAS test results in cancer patients

[R] Text mining? Text manipulation? Both? Predicting KRAS test results in cancer patients

Possibly Parallel Threads

Wisdom of the Ancients