thr3ads.net - R help - [R] Extracting certain text using tm package [Jun 2011]

If this information is useful, please help other people find it:
Share via:

Twitter
Facebook
Email

vioravis

2011-Jun-27 06:47 UTC

[R] Extracting certain text using tm package

I have used "tm" package to import a set of text documents using the
following command:

text <- Corpus(DirSource("."),readerControl = list(language
="ansi"))

I would like to extract only a certain portion of the text in each document
using certain keywords. For example, I would like to include all the text
between key words <Start Text> and <End Text>. All the remaining
text should
be discarded. Is there anyway to accomplish this in 'tm' package???

Also, is there a quick way to remove all the HTML tags from the text???

Thank you.

Ravi





--
View this message in context:
http://r.789695.n4.nabble.com/Extracting-certain-text-using-tm-package-tp3627063p3627063.html
Sent from the R help mailing list archive at Nabble.com.

Possibly Parallel Threads

Extracting information from text data
Library (tm) Error: could not find function "TermDocMatrix".
text mining
tm: Why does adding local metadata take so long?
Help needed for Loading "tm" package

Search for more possibly parallel threads

R help - Jun 2011 - Extracting certain text using tm package

[R] Extracting certain text using tm package

Possibly Parallel Threads

Wisdom of the Ancients