Hi Paul
I have seen this - it's part of the tm package mentioned originally. So,
I've tried it again and perhaps I'm using stemDocument incorrectly, but
this is what I am doing:
# > library(tm)
Loading required package: NLP
> text.v <- scan(file.choose(), what = 'char', sep =
'\n')
Read 938 items
# >text.stem.v <- stemDocument(text.v, language = 'english')
But it isn't changing anything in the body of the text I'm passing to it
- the words are unlemmatized/ unstemmed.
When I try using SnowballC, the error returned is that tm_map doesn't
have a method to work with objects of class 'character'.
Again, the problem is that tm doesn't seem to allow for concordance
analysis ... or perhaps it does and I just haven't figured out how to do
it, so am happy to be shown some documentation on that process, and
whether that is applied before or after the text is transformed into a
DTM because searching on-line hasn't (yet) thrown anything back.
Thanks.
Andy
On 26/07/16 08:50, Paul Johnston wrote:> Suggest look at http://www.inside-r.org/packages/cran/tm/docs/stemDocument
>
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Andy
Wolfe
> Sent: 26 July 2016 08:10
> To: r-help at r-project.org
> Subject: [R] word stemming for corpus linguistics
>
> Hi list
>
> On a piece of work I'm doing in corpus linguistics, using a combo of
texts by Gries "Quantitative Corpus Linguistics with R: A Practical
Introduction" and Jockers "Text Analysis with R for Students of
Literature", which are both really excellent by the way, I want to stem or
lemmatize the words so that, for e.g., 'facilitating',
'facilitated', and 'facilitates' all become 'facilit'.
>
> In text mining, using a combination of the packages 'tm' and
'SnowballC'
> this is feasible, but then I am finding that working with the DTM (document
term matrix) becomes difficult for when I want to do concordance (or key word in
context) analysis.
>
> So, two questions:
>
> (1) is there a package for R version 3.3.1 that can work with corpus
linguistics? and/ or
>
> (2) is there a way of doing concordance analysis using the tm package as
part of the whole text mining process?
>
> I appreciate any help. Thanks.
>
> Andy
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>