thr3ads.net - similar to: "ideas on picking stopwords"

Displaying 20 results from an estimated 2000 matches similar to: "ideas on picking stopwords"

2008 Mar 27

Proper noun stemming

Hi All I was wondering if anyone had a solution for the following problem. I user QueryParser to stem my documents before adding them to a database. During the stemming process I would like to find a way of keeping proper nouns that span two or more words together as a phrase. For example "New York" or "Gordon Brown" or "Prime Minister" get spilt up. I see

how can i use stopwords?

2008 Mar 12

how can i use stopwords?

Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );

KMeans Clusterer - Going forward

2017 Jun 14

KMeans Clusterer - Going forward

Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a

Stopword addition and stemming

2010 Nov 15

Stopword addition and stemming

Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in

HP-UX slow login problem found?

2002 Jul 12

HP-UX slow login problem found?

I think I finally figured out the problem that many people have been having with extremely long login times under HP-UX 11.x. The problem is really in OpenSSL, and in particular the Diffie-Hellman parameter generation routines under the PA-RISC processor. I suspect this may not be a problem with the IA64 (Itanium) processors. This especially shows up if you use the gcc compiler. Fortunately I

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm)

flintlock fork causes hang in Apache+Python+mod_python

2007 Aug 28

flintlock fork causes hang in Apache+Python+mod_python

I am trying to use Xapian 1.0.2 with the Python SWIG bindings withn an environment consisting of Apache httpd with mod_python. (not as a CGI) Also this is Linux. Whenever the python code attempts to open a database the entire httpd process will hang indefinitely. The python bindings work outside of the apache/mod_python environment. >From the best I can tell the hang occurs in

HP-UX 11 Corrupted MAC errors

2002 Jul 26

HP-UX 11 Corrupted MAC errors

Using 3.4p1 under HP-UX 11.0 I am repeatedly getting disconnected with Corrupted MAC on input. I am connecting from a RedHat Linux client (at 3.1p1). The incorrect MAC is appearing on the server packet receive side. Never get an invalid MAC on the client side. I'm currently diving into packet.c to try to find this, but the behavior is so strange and predictable I thought I'd see if

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-

LIBCRYPTO?

2002 Aug 30

LIBCRYPTO?

Hi all, I have a question about OpenSSH configuration. In Makefile there is defined LIBS=$(LIBCRYPTO), but the problem is that the version of OpenSSL that I'm using holds only the version LIBCRYPT. When adding LIBCRYPT to the Makefile I get: sshd.elf2flt: In function `key_regeneration_alarm': /.../ssh/sshd.c:252: undefined reference to `RSA_free' /.../ssh/sshd.c:253: undefined

v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]

2020 Nov 02

v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed) [proposed patch]

> On 02/11/2020 15:11 PGNet Dev <pgnet.dev at gmail.com> wrote: > > > On 11/2/20 12:44 AM, Aki Tuomi wrote: > > you should try removing use_libfts from your config line and let solr do that part. > > sry, i'm a bit confused. > > you'd suggested I _add_ it, > > https://dovecot.org/pipermail/dovecot/2020-October/120258.html > > > I

v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)

2020 Oct 18

v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)

I've since rebuilt/reconfig'd all parts of my setup from scratch; some good cleanup along the way. Atm, my entire system for send/recv, store/retrieve, + rules & search is working as I intend. Ok, mostly ... Except for this accented-character search mystery. I've got a _lot_ of mail with various languages in bodies, so _do_ need to get this sorted. > On 10/18/20 2:58 PM,

Problem with Snowball & RWeka

2011 Jun 04

Problem with Snowball & RWeka

I too have this problem. Everything worked fine last year, but after updating R and packages I can no longer do word stemming. Unfortunately, I didn't save the old binaries, otherwise I would just revert back. Hoping someone finds a solution for R on Windows. Thanks! There is a potential solution for R on Mac OS from Kurt Hornik copied below, but I cannot get this to work on Windows.

Stopwords: Topic modelling con LDA

2020 Apr 28

Stopwords: Topic modelling con LDA

Buenos días, Estoy realizando un análisis de topic models con el método LDA. En principio, he quitado del análisis las palabras "stopwords" universales. A la hora de ver los topics y sus palabras más frecuentes encuentro que son muy similares y hay palabras que aparecen en todos los topics. Los textos que estoy analizando son opiniones de consumidores sobre una categoría concreta de

stopwords

2004 Dec 14

stopwords

Hi! I would like to use the lists of stopwords provided with Xapian. Are there some standard way to remove stopwords automatically, or should I implement it mysel in the indexer? Regards, Georges Dupret

tweaking minimum word length?

2006 Jul 26

tweaking minimum word length?

Hi, Can Ferret be configured to change the minimum word length of what it indexes? Right now it seems to drop words 3 characters or less, but I''d like to include words going down to 2 characters. How would I do that? Francis

Getting non-stemmed terms from IndexReader

2007 Mar 04

Getting non-stemmed terms from IndexReader

I need to get a set of terms being indexed using Ferret. I used IndexReader.terms and it returns a list of TermEnum nicely. The only problem is that my analyzer includes a stemming filter. So now, the terms I''m getting back are all stemmed. Is there anyway to get the original unstemmed terms back from the index somehow? Thanks. -- Posted via http://www.ruby-forum.com/.

v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)

2020 Oct 19

v2.3.11.3 solr plugin search via MUA fails to match accented ascii characters; cmd line exec of `doveadm fts lookup` PANICs (assertion failed)

On 10/19/20 1:18 AM, John Fawcett wrote: > I would recommend you to redo the tests after correcting the > configuration. To be doubly sure you can include accented and unique non > accented text in the same email and search for both. If the non accented > text is found you know you've searching against the updated index and > the fact that accented text is not found is not

similar to: ideas on picking stopwords