search for: stopword

Displaying 20 results from an estimated 113 matches for "stopword".

Did you mean: stopwords
2008 Mar 12
1
how can i use stopwords?
Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}...
2009 Nov 12
1
How can this code be improved?
I am running the following code on a MacBook Pro 17" Unibody early 2009 with 8GB RAM, OS X 10.5.8, R 2.10.0 Patch from Nov. 2, 2009, in 64-bit mode. freq.stopwords <- numeric(0) freq.nonstopwords <- numeric(0) token.tables <- list(0) i.ss <- c(0) cat("Beginning at ", date(), ".\n") for (i.d in 1:length(tokens)) { tt <- list(0) for (i.s in 1:length(tokens[[i.d]])) { t <- tolower(tokens[[i.d]][[i.s]]) t <- sub(&qu...
2009 Mar 26
1
ideas on picking stopwords
I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little scri...
2011 Oct 04
1
Reading stopwords from a csv file
I am using the tm package to do text miniing: I have a huge list of stopwords (2000+) that are in a csv file. I read it as follows: stopwordlist <- read.csv("stopwords to be Removed 10042011.csv") myStopwords <- as.character(stopwordlist$stopwords) When try removing the stopwords using tr1=tm_map(tr1,removeWords,myStopwords) I am getting the following e...
2010 Nov 15
4
Stopword addition and stemming
Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it seems to be searching for two extremely common (almost every document will have something.co....
2007 Nov 11
0
Stopwords in tm package
Hi to all, I need to append/delete stopwords from the list that i can use from de TM package. I use Portuguese stopwords. When i see the list of stopwords using >stopwords("portuguese") I have some words with special characters like this: "verdadeiro" "voc??" "voc??s" "vos&qu...
2017 Jun 14
2
KMeans Clusterer - Going forward
...stance calculation depends on number of terms). Getting the useful terms within a document in its document vector can improve its accuracy, due to less noise terms. Two important things to be done in this direction are : 1) Stemming This is easier because xapian already provides stemmed terms. 2) Stopword removal Use either Xapian::SimpleStopper or create a subclass of Xapian::Stopper to determine whether a term that is fed to it is a stopword or not. But for determining which terms are stopwords, I was wondering whether we'd be using the stopword list within xapian/languages/stopwords or will w...
2004 Dec 14
1
stopwords
Hi! I would like to use the lists of stopwords provided with Xapian. Are there some standard way to remove stopwords automatically, or should I implement it mysel in the indexer? Regards, Georges Dupret
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and session...
2020 Apr 28
3
Stopwords: Topic modelling con LDA
Buenos días, Estoy realizando un análisis de topic models con el método LDA. En principio, he quitado del análisis las palabras "stopwords" universales. A la hora de ver los topics y sus palabras más frecuentes encuentro que son muy similares y hay palabras que aparecen en todos los topics. Los textos que estoy analizando son opiniones de consumidores sobre una categoría concreta de cosméticos, por lo que la temática es muy conc...
2006 Jul 26
13
tweaking minimum word length?
Hi, Can Ferret be configured to change the minimum word length of what it indexes? Right now it seems to drop words 3 characters or less, but I''d like to include words going down to 2 characters. How would I do that? Francis
2020 Apr 29
2
[Posible SPAM] Re: Stopwords: Topic modelling con LDA
...%3a%2f%2fwww.qualityecellence.es > > El mar., 28 abr. 2020 a las 11:44, <miriam.alzate en unavarra.es> escribió: > >> Buenos días, >> >> Estoy realizando un análisis de topic models con el método LDA. En >> principio, he quitado del análisis las palabras "stopwords" universales. >> A >> la hora de ver los topics y sus palabras más frecuentes encuentro que >> son >> muy similares y hay palabras que aparecen en todos los topics. Los >> textos >> que estoy analizando son opiniones de consumidores sobre una categoría >...
2012 Oct 25
2
Minería de texto
...Saludo Actualmente estoy realizando una función para gráficar una nube de palabras el código que tengo es el siguiente: library(twitteR)library(tm)library(wordcloud)library(RXKCD)library(RColorBrewer) tweets=searchTwitter(''@afflorezr'', n=1500) generateCorpus= function(tweets,my.stopwords=c(),min.freq){ #Install the textmining library require(tm) require(wordcloud) tw.df=twListToDF(tweets) RemoveAtPeople <- function(x){gsub("@\\w+", "",x)} df<- as.vector(sapply(tw.df$text, RemoveAtPeople)) #The following is cribbed and seems to do what it says on th...
2020 Mar 22
0
Unable to build RPM for Centos 7
...mbuild/BUILDROOT/xapian-core-1.4.15-1.x86_64/usr/share/aclocal' ?/usr/bin/install -c -m 644 m4-macros/xapian.m4 '/root/rpmbuild/BUILDROOT/xapian-core-1.4.15-1.x86_64/usr/share/aclocal' ?/usr/bin/mkdir -p '/root/rpmbuild/BUILDROOT/xapian-core-1.4.15-1.x86_64/usr/share/xapian-core/stopwords' ?/usr/bin/install -c -m 644 languages/stopwords/arabic.list languages/stopwords/danish.list languages/stopwords/dutch.list languages/stopwords/english.list languages/stopwords/finnish.list languages/stopwords/french.list languages/stopwords/german.list languages/stopwords/hungarian.list...
2007 May 30
3
A way to get all the words from an index?
Hi, I am just wondering if there''s a way to get all the words from an index. Basically, all the words that have been indexed (excluding the stopwords if I''m using the stopwords analyzer, etc.) The fields I''m putting in are not :stored in the index. The idea is to implement a "did you mean?" mecanism, which is based on the content of the index, not on a dictionary... Possible? Thank you! Philippe April
2006 Nov 13
1
Stemming, stop words, acts_as_ferret
...def token_stream(field, reader) return Ferret::Analysis::PorterStemFilter.new(Ferret::Analysis::LowerCaseTokenizer. new(reader)) end end class Summary < ActiveRecord::Base acts_as_ferret(:analyzer => StemmedAnalyzer.new) But this doesn''t appear to give me either stemming or stopwords. It does give me basic searching (searches for exact keywords without stopwords work, searches with stopwords return no results). I''ve looked through the archives, and I''m still confused. Suggestions? - James Moore
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
...d configuration. Removal of punctuation, white space, and numbers is flawless, but the inability to remove stop words prevents me from further analysing the texts. Has somebody had experience with languages other than English, and for which there is no predefined stop list available through the stopwords function? I will highly appreciate any tips and advice! Thanks in advance, Vince
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
...d configuration. Removal of punctuation, white space, and numbers is flawless, but the inability to remove stop words prevents me from further analysing the texts. Has somebody had experience with languages other than English, and for which there is no predefined stop list available through the stopwords function? I will highly appreciate any tips and advice! Thanks in advance, Vince
2007 Jan 22
1
stopwords
Hello all, Does anybody know if the word ''other'' is a special word for ferret ? I don''t manage to index it ! Johan Johan Duflost Analyst Programmer Belgian Biodiversity Platform ( http://www.biodiversity.be) Belgian Federal Science Policy Office (http://www.belspo.be ) Tel:+32 2 650 5751 Fax: +32 2 650 5124
2010 Oct 08
1
Get a list of all terms in an indexed corpus
Hello, I have a corpus that I have indexed with xapian/xappy and I would now like to generate a corpus-specific list of stopwords. (This is a technical corpus, so a typical stopword list wouldn't be helpful.) My first thought was to ask the xapian database for a list of terms followed by their frequency. My intuition is that I could probably bring together a list of stopwords by examining the head and tail of the list....