Displaying 20 results from an estimated 113 matches for "stopwording".
2008 Mar 12
1
how can i use stopwords?
Hi,
I do not understand the stopword function...
I've set the termgenerator like this:
$self->{'Stemmer'} = new Search::Xapian::Stem(german2);
$self->{'Stopper'} = new Search::Xapian::SimpleStopper();
$self->{'TermGenerator'} = new Search::Xapian::TermGenerator;
$self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );
2009 Nov 12
1
How can this code be improved?
I am running the following code on a MacBook Pro 17" Unibody early
2009 with 8GB RAM, OS X 10.5.8, R 2.10.0 Patch from Nov. 2, 2009, in
64-bit mode.
freq.stopwords <- numeric(0)
freq.nonstopwords <- numeric(0)
token.tables <- list(0)
i.ss <- c(0)
cat("Beginning at ", date(), ".\n")
for (i.d in 1:length(tokens)) {
tt <- list(0)
for (i.s in
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was
wondering if anyone had any good rules of thumb on how to pick which
words to blacklist. It all seems a little... well... vague. Although I
guess it kind of depends on the sort of documents you're wanting to index.
My current idea is to write a little script to output the terms with the
highest frequency in my
2011 Oct 04
1
Reading stopwords from a csv file
I am using the tm package to do text miniing:
I have a huge list of stopwords (2000+) that are in a csv file. I read it as
follows:
stopwordlist <- read.csv("stopwords to be Removed 10042011.csv")
myStopwords <- as.character(stopwordlist$stopwords)
When try removing the stopwords using
tr1=tm_map(tr1,removeWords,myStopwords)
I am getting the following error:
Error in
2010 Nov 15
4
Stopword addition and stemming
Hi,
Two questions which I'm unsure about:
Stemming: I've turned on stemming, etc, but how can I confirm that
it's being used in searches? What should I look/search for?
Stopwords: I'm trying out xapian on a regional dataset (searching
data from a *.co.us TLD, eg) . I've noticed that searching for [bob
co.us] results in *very* slow search times (tens of seconds), since it
2007 Nov 11
0
Stopwords in tm package
Hi to all,
I need to append/delete stopwords from the list that i can use from de
TM package. I use Portuguese stopwords.
When i see the list of stopwords using >stopwords("portuguese") I have
some words with special characters like this:
"verdadeiro" "voc??" "voc??s" "vos"
I try to change the portuguese.dat file from
2017 Jun 14
2
KMeans Clusterer - Going forward
Hello,
I have finished moving the API to PIMPL classes and will fix issues within
the current code over the next week, based on reviews from mentors.
The next step going forward is to start with forming document vectors that
are reduced and more useful. This majorly helps in saving run time (since
time for distance calculation depends on number of terms). Getting the
useful terms within a
2004 Dec 14
1
stopwords
Hi!
I would like to use the lists of stopwords provided with Xapian. Are
there some standard way to remove stopwords automatically, or should I
implement it mysel in the indexer?
Regards,
Georges Dupret
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
I am using code that previously worked to remove stopwords using package
"tm". Even manually adding "the" to the list does not work to remove "the".
This package has undergone extensive redevelopment with changes to the
function syntax, so perhaps I am just missing something.
Please see my simple example, output, and sessionInfo() below.
Thanks!
Mark
require(tm)
2020 Apr 28
3
Stopwords: Topic modelling con LDA
Buenos días,
Estoy realizando un análisis de topic models con el método LDA. En
principio, he quitado del análisis las palabras "stopwords" universales. A
la hora de ver los topics y sus palabras más frecuentes encuentro que son
muy similares y hay palabras que aparecen en todos los topics. Los textos
que estoy analizando son opiniones de consumidores sobre una categoría
concreta de
2006 Jul 26
13
tweaking minimum word length?
Hi,
Can Ferret be configured to change the minimum word length of what it
indexes? Right now it seems to drop words 3 characters or less, but
I''d like to include words going down to 2 characters. How would I do
that?
Francis
2020 Apr 29
2
[Posible SPAM] Re: Stopwords: Topic modelling con LDA
Hola,
Acabo de calcular tf-idf y me surge una duda. ¿Habría un valor de idf o
tf-idf que se considerara como umbral para establecer que una palabra es
muy común o no? Los valores de idf en mis datos van entre 0 y 3.78 y los
de tf-idf ente 0 y 0.07.
Un saludo
El Mar, 28 de Abril de 2020, 12:53, Carlos Ortega escribió:
> Hola,
> Yo de primeras los quitaría para qué otros topics aparecen.
2012 Oct 25
2
Minería de texto
Cordial Saludo
Actualmente estoy realizando una función para gráficar una nube de palabras el código que tengo es el siguiente:
library(twitteR)library(tm)library(wordcloud)library(RXKCD)library(RColorBrewer)
tweets=searchTwitter(''@afflorezr'', n=1500)
generateCorpus= function(tweets,my.stopwords=c(),min.freq){ #Install the textmining library require(tm) require(wordcloud)
2020 Mar 22
0
Unable to build RPM for Centos 7
Hi,
we are an email hosting provider and we are looking at xapian to improve
our user experience about email search.
So we staring to build xapian 1.4.15 on Centos 7 with your
xapian-core.spec and move it and the source code in /root/rpmbuild/SPECS
and SOURCE, but we have this error after run "rpmbuild -ba":
[...]
Elaborazione file: xapian-core-devel-1.4.15-1.x86_64
errore: File
2007 May 30
3
A way to get all the words from an index?
Hi,
I am just wondering if there''s a way to get all the words from an
index. Basically, all the words that have been indexed (excluding the
stopwords if I''m using the stopwords analyzer, etc.)
The fields I''m putting in are not :stored in the index.
The idea is to implement a "did you mean?" mecanism, which is based
on the content of the index, not on a
2006 Nov 13
1
Stemming, stop words, acts_as_ferret
I''d like to get the following behavior:
1. Stemming. The search is on a database of summaries of California legal
cases. Things like a search for "thermal image" needs to hit "thermal
imaging."
2. Stop words. Searches for "failing to instruct the jury" should come up
with hits on a search for "fail to instruct."
3. Case-insensitive.
What I
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
Hi,
I bumped into a serious issue while trying to analyse some texts in
Bulgarian language (with the tm package). I import a tab-separated csv
file, which holds a total of 22 variables, most of which are text cells
(not factors), using the read.delim function:
data<-read.delim("bigcompanies_ascii.csv",
header=TRUE,
quote="'",
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
Hi,
I bumped into a serious issue while trying to analyse some texts in
Bulgarian language (with the tm package). I import a tab-separated csv
file, which holds a total of 22 variables, most of which are text cells
(not factors), using the read.delim function:
data<-read.delim("bigcompanies_ascii.csv",
header=TRUE,
quote="'",
2007 Jan 22
1
stopwords
Hello all,
Does anybody know if the word ''other'' is a special word for ferret ? I
don''t manage to index it !
Johan
Johan Duflost
Analyst Programmer
Belgian Biodiversity Platform ( http://www.biodiversity.be)
Belgian Federal Science Policy Office (http://www.belspo.be )
Tel:+32 2 650 5751 Fax: +32 2 650 5124
2010 Oct 08
1
Get a list of all terms in an indexed corpus
Hello,
I have a corpus that I have indexed with xapian/xappy and I would now
like to generate a corpus-specific list of stopwords. (This is a
technical corpus, so a typical stopword list wouldn't be helpful.)
My first thought was to ask the xapian database for a list of terms
followed by their frequency. My intuition is that I could probably bring
together a list of stopwords by examining