thr3ads.net - R help - [R] word stemming for corpus linguistics [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Andy Wolfe

2016-Jul-26 07:10 UTC

[R] word stemming for corpus linguistics

Hi list

On a piece of work I'm doing in corpus linguistics, using a combo of 
texts by Gries "Quantitative Corpus Linguistics with R: A Practical 
Introduction" and Jockers "Text Analysis with R for Students of 
Literature", which are both really excellent by the way, I want to stem 
or lemmatize the words so that, for e.g., 'facilitating',
'facilitated',
and 'facilitates' all become 'facilit'.

In text mining, using a combination of the packages 'tm' and
'SnowballC'
this is feasible, but then I am finding that working with the DTM 
(document term matrix) becomes difficult for when I want to do 
concordance (or key word in context) analysis.

So, two questions:

(1) is there a package for R version 3.3.1 that can work with corpus 
linguistics? and/ or

(2) is there a way of doing concordance analysis using the tm package as 
part of the whole text mining process?

I appreciate any help. Thanks.

Andy


	[[alternative HTML version deleted]]

Paul Johnston

2016-Jul-26 07:50 UTC

head link

[R] word stemming for corpus linguistics

Suggest look at http://www.inside-r.org/packages/cran/tm/docs/stemDocument

-----Original Message-----
From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Andy Wolfe
Sent: 26 July 2016 08:10
To: r-help at r-project.org
Subject: [R] word stemming for corpus linguistics

Hi list

On a piece of work I'm doing in corpus linguistics, using a combo of texts
by Gries "Quantitative Corpus Linguistics with R: A Practical
Introduction" and Jockers "Text Analysis with R for Students of
Literature", which are both really excellent by the way, I want to stem or
lemmatize the words so that, for e.g., 'facilitating',
'facilitated', and 'facilitates' all become 'facilit'.

In text mining, using a combination of the packages 'tm' and
'SnowballC'
this is feasible, but then I am finding that working with the DTM (document term
matrix) becomes difficult for when I want to do concordance (or key word in
context) analysis.

So, two questions:

(1) is there a package for R version 3.3.1 that can work with corpus
linguistics? and/ or

(2) is there a way of doing concordance analysis using the tm package as part of
the whole text mining process?

I appreciate any help. Thanks.

Andy

	[[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Andy Wolfe

2016-Jul-26 08:13 UTC

head link

[R] word stemming for corpus linguistics

Hi Paul

I have seen this - it's part of the tm package mentioned originally. So, 
I've tried it again and perhaps I'm using stemDocument incorrectly, but 
this is what I am doing:

# > library(tm)
Loading required package: NLP
 > text.v <- scan(file.choose(), what = 'char', sep =
'\n')
Read 938 items
# >text.stem.v <- stemDocument(text.v, language = 'english')

But it isn't changing anything in the body of the text I'm passing to it
- the words are unlemmatized/ unstemmed.

When I try using SnowballC, the error returned is that tm_map doesn't 
have a method to work with objects of class 'character'.

Again, the problem is that tm doesn't seem to allow for concordance 
analysis ... or perhaps it does and I just haven't figured out how to do 
it, so am happy to be shown some documentation on that process, and 
whether that is applied before or after the text is transformed into a 
DTM because searching on-line hasn't (yet) thrown anything back.

Thanks.
Andy


On 26/07/16 08:50, Paul Johnston wrote:> Suggest look at http://www.inside-r.org/packages/cran/tm/docs/stemDocument
>
>
>
> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Andy
Wolfe
> Sent: 26 July 2016 08:10
> To: r-help at r-project.org
> Subject: [R] word stemming for corpus linguistics
>
> Hi list
>
> On a piece of work I'm doing in corpus linguistics, using a combo of
texts by Gries "Quantitative Corpus Linguistics with R: A Practical
Introduction" and Jockers "Text Analysis with R for Students of
Literature", which are both really excellent by the way, I want to stem or
lemmatize the words so that, for e.g., 'facilitating',
'facilitated', and 'facilitates' all become 'facilit'.
>
> In text mining, using a combination of the packages 'tm' and
'SnowballC'
> this is feasible, but then I am finding that working with the DTM (document
term matrix) becomes difficult for when I want to do concordance (or key word in
context) analysis.
>
> So, two questions:
>
> (1) is there a package for R version 3.3.1 that can work with corpus
linguistics? and/ or
>
> (2) is there a way of doing concordance analysis using the tm package as
part of the whole text mining process?
>
> I appreciate any help. Thanks.
>
> Andy
>
>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

R help - Jul 2016 - word stemming for corpus linguistics

[R] word stemming for corpus linguistics

[R] word stemming for corpus linguistics

[R] word stemming for corpus linguistics