thr3ads.net - R help - [R] Help: stemming and stem completion with package tm in R [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Yanchang Zhao

2011-Nov-04 01:28 UTC

[R] Help: stemming and stem completion with package tm in R

Hi All

I came across a problem below when doing stemming and stem completion
with package tm in R. Word "mining" was stemmed to "mine"
with
stemDocument(), and then completed to "miners"with stemCompletion().
However, I prefer to keep "mining" intact.

For stemCompletion(), the default type of completion is "prevalent",
which takes the most frequent match as completion. Although "mining"
is much more frequent than "miners" in my text, it still completed
"mine" to "miners".

An example is shown below.

############################################
library(tm)
(a <- c("mining", "miners", "mining"))
(b <- stemDocument(a))
(d <- stemCompletion(b, dictionary=a))
############################################

Some possible solutions are:
1) to change the options or dictionary in stemDocument(), so that
"mining" is not stemmed to "mine", which I think is the best
way;
2) to change the options or dictionary in stemCompletion(), so that
"mine" is completed to "mining";
3) to manually correct this after stem completion, which is the last
option.

I am looking for a solution for above 1) or 2), but cannot find the
way to do it with stemDocument() in package tm.

Any help will be appreciated.

Thanks,
Yanchang Zhao
Email: yanchangzhao(at)gmail.com

RDataMining:           http://www.rdatamining.com
Twitter:               http://twitter.com/RDataMining
Group on Linkedin:   http://group2.rdatamining.com
	[[alternative HTML version deleted]]

Felix Andrews

2011-Nov-07 12:38 UTC

head link

[R] Help: stemming and stem completion with package tm in R

Hi Yanchang,

The problem seems to be that stemCompletion only looks for words that
begin with "mine", and "mining" does not strictly begin with
"mine". I
don't think there is any easy way to modify stemCompletion to get
around that.

However, maybe you could substitute the most prevalent word in your
document for each of the stemmed words, then you would not need to use
stemCompletion at all: e.g.

topfreq <- function(x) rev(names(sort(table(x))))[1]
(d <- ave(a, b, FUN = topfreq))
# [1] "mining" "miners" "mining"

Cheers
Felix

On 4 November 2011 12:28, Yanchang Zhao <yanchangzhao at gmail.com>
wrote:> Hi All
>
> I came across a problem below when doing stemming and stem completion
> with package tm in R. Word "mining" was stemmed to
"mine" with
> stemDocument(), and then completed to "miners"with
stemCompletion().
> However, I prefer to keep "mining" intact.
>
> For stemCompletion(), the default type of completion is
"prevalent",
> which takes the most frequent match as completion. Although
"mining"
> is much more frequent than "miners" in my text, it still
completed
> "mine" to "miners".
>
> An example is shown below.
>
> ############################################
> library(tm)
> (a <- c("mining", "miners", "mining"))
> (b <- stemDocument(a))
> (d <- stemCompletion(b, dictionary=a))
> ############################################
>
> Some possible solutions are:
> 1) to change the options or dictionary in stemDocument(), so that
> "mining" is not stemmed to "mine", which I think is the
best way;
> 2) to change the options or dictionary in stemCompletion(), so that
> "mine" is completed to "mining";
> 3) to manually correct this after stem completion, which is the last
> option.
>
> I am looking for a solution for above 1) or 2), but cannot find the
> way to do it with stemDocument() in package tm.
>
> Any help will be appreciated.
>
> Thanks,
> Yanchang Zhao
> Email: yanchangzhao(at)gmail.com
>
> RDataMining: ? ? ? ? ? http://www.rdatamining.com
> Twitter: ? ? ? ? ? ? ? http://twitter.com/RDataMining
> Group on Linkedin: ? http://group2.rdatamining.com
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Felix Andrews / ???
http://www.neurofractal.org/felix/

Seemingly Similar Threads

Search for more reasonably related threads

R help - Nov 2011 - Help: stemming and stem completion with package tm in R

[R] Help: stemming and stem completion with package tm in R

[R] Help: stemming and stem completion with package tm in R

Seemingly Similar Threads