thr3ads.net - similar to: "stopwords"

2004 Sep 10

2

terms weight access

Hi! First I would like to thank people working on the xapian project. I have been trying hard to find out how to access the indexed documents weight vectors without success. I found the query weights and the total weights of documents, but not the individual weights. Could somebody give me a hint? Georges Dupret

terms weight access

2004 Sep 10

2

terms weight access

Hi! First I would like to thank people working on the xapian project. I have been trying hard to find out how to access the indexed documents weight vectors without success. I found the query weights and the total weights of documents, but not the individual weights. Could somebody give me a hint? Georges Dupret

how can i use stopwords?

2008 Mar 12

1

how can i use stopwords?

Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );

Reading stopwords from a csv file

2011 Oct 04

1

Reading stopwords from a csv file

I am using the tm package to do text miniing: I have a huge list of stopwords (2000+) that are in a csv file. I read it as follows: stopwordlist <- read.csv("stopwords to be Removed 10042011.csv") myStopwords <- as.character(stopwordlist$stopwords) When try removing the stopwords using tr1=tm_map(tr1,removeWords,myStopwords) I am getting the following error: Error in

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

2

package "tm" fails to remove "the" with remove stopwords

I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm)

ideas on picking stopwords

2009 Mar 26

1

ideas on picking stopwords

I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my

Stopwords: Topic modelling con LDA

2020 Apr 28

3

Stopwords: Topic modelling con LDA

Buenos días, Estoy realizando un análisis de topic models con el método LDA. En principio, he quitado del análisis las palabras "stopwords" universales. A la hora de ver los topics y sus palabras más frecuentes encuentro que son muy similares y hay palabras que aparecen en todos los topics. Los textos que estoy analizando son opiniones de consumidores sobre una categoría concreta de

[Posible SPAM] Re: Stopwords: Topic modelling con LDA

2020 Apr 29

2

[Posible SPAM] Re: Stopwords: Topic modelling con LDA

Hola, Acabo de calcular tf-idf y me surge una duda. ¿Habría un valor de idf o tf-idf que se considerara como umbral para establecer que una palabra es muy común o no? Los valores de idf en mis datos van entre 0 y 3.78 y los de tf-idf ente 0 y 0.07. Un saludo El Mar, 28 de Abril de 2020, 12:53, Carlos Ortega escribió: > Hola, > Yo de primeras los quitaría para qué otros topics aparecen.

Stopword addition and stemming

2010 Nov 15

4

Stopword addition and stemming

Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it

Stopwords in tm package

2007 Nov 11

0

Stopwords in tm package

Hi to all, I need to append/delete stopwords from the list that i can use from de TM package. I use Portuguese stopwords. When i see the list of stopwords using >stopwords("portuguese") I have some words with special characters like this: "verdadeiro" "voc??" "voc??s" "vos" I try to change the portuguese.dat file from

How can this code be improved?

2009 Nov 12

1

How can this code be improved?

I am running the following code on a MacBook Pro 17" Unibody early 2009 with 8GB RAM, OS X 10.5.8, R 2.10.0 Patch from Nov. 2, 2009, in 64-bit mode. freq.stopwords <- numeric(0) freq.nonstopwords <- numeric(0) token.tables <- list(0) i.ss <- c(0) cat("Beginning at ", date(), ".\n") for (i.d in 1:length(tokens)) { tt <- list(0) for (i.s in

survfit & number of variables != number of variable names

2012 Nov 17

4

survfit & number of variables != number of variable names

This works ok: > cox = coxph(surv ~ bucket*(today + accor + both) + activity, data = data) > fit = survfit(cox, newdata=data[1:100,]) but using strata leads to problems: > cox.s = coxph(surv ~ bucket*(today + accor + both) + strata(activity), > data = data) > fit.s = survfit(cox.s, newdata=data[1:100,]) Error in model.frame.default(data = data[1:100, ], formula = ~bucket + :

Stemming, stop words, acts_as_ferret

2006 Nov 13

1

Stemming, stop words, acts_as_ferret

I''d like to get the following behavior: 1. Stemming. The search is on a database of summaries of California legal cases. Things like a search for "thermal image" needs to hit "thermal imaging." 2. Stop words. Searches for "failing to instruct the jury" should come up with hits on a search for "fail to instruct." 3. Case-insensitive. What I

A way to get all the words from an index?

2007 May 30

3

A way to get all the words from an index?

Hi, I am just wondering if there''s a way to get all the words from an index. Basically, all the words that have been indexed (excluding the stopwords if I''m using the stopwords analyzer, etc.) The fields I''m putting in are not :stored in the index. The idea is to implement a "did you mean?" mecanism, which is based on the content of the index, not on a

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

3

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

3

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

Hi, I bumped into a serious issue while trying to analyse some texts in Bulgarian language (with the tm package). I import a tab-separated csv file, which holds a total of 22 variables, most of which are text cells (not factors), using the read.delim function: data<-read.delim("bigcompanies_ascii.csv", header=TRUE, quote="'",

Minería de texto

2012 Oct 25

2

Minería de texto

Cordial Saludo Actualmente estoy realizando una función para gráficar una nube de palabras el código que tengo es el siguiente: library(twitteR)library(tm)library(wordcloud)library(RXKCD)library(RColorBrewer) tweets=searchTwitter(''@afflorezr'', n=1500) generateCorpus= function(tweets,my.stopwords=c(),min.freq){ #Install the textmining library require(tm) require(wordcloud)

KMeans Clusterer - Going forward

2017 Jun 14

2

KMeans Clusterer - Going forward

Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a

tm package- remove stowords failling

2010 Mar 31

1

tm package- remove stowords failling

Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML

Problem with lsa package (data.frame) on Windows XP

2007 Aug 18

2

Problem with lsa package (data.frame) on Windows XP

Dear R team, The following piece of code (to use the lsa package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1,

similar to: stopwords