thr3ads.net - similar to: "How can this code be improved?"

Displaying 20 results from an estimated 10000 matches similar to: "How can this code be improved?"

2012 Jan 27

tm package: handling contractions

I tried making a wordcloud of Obama's State of the Union address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp,

Minería de texto

2012 Oct 25

Minería de texto

Cordial Saludo Actualmente estoy realizando una función para gráficar una nube de palabras el código que tengo es el siguiente: library(twitteR)library(tm)library(wordcloud)library(RXKCD)library(RColorBrewer) tweets=searchTwitter(''@afflorezr'', n=1500) generateCorpus= function(tweets,my.stopwords=c(),min.freq){ #Install the textmining library require(tm) require(wordcloud)

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-

Problem with lsa package (data.frame) on Windows XP

2007 Aug 18

Problem with lsa package (data.frame) on Windows XP

Dear R team, The following piece of code (to use the lsa package) works fine on my mac os x, but when I run the same code on Windows XP, it doesn't work any more. ### code: library("lsa") matrix1 = textmatrix("C:\\Documents and Settings\\tine stalmans.TINE. 000\\LSA\\cuentos\\", stemming=TRUE, language="spanish", minWordLength=2, minDocFreq=1,

new to R: don't understand errors

2006 Oct 03

new to R: don't understand errors

Hello all, I'm brand new to the use of R, and I'm trying to quickly learning the rudiments for a couple of projects here at work. I'm working with the lsa package and trying to generate various semantic spaces. I seem to do well with small collections of clean text files, but now that I am trying to work with larger collections of less than perfection files, I'm getting errors

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

Buenas tardes grupo. Saludos cordiales Carlos J., muchas gracias por tu orientación. Efectivamente, me había dado cuenta que la razón por la que no se aplicaba colnames era porque no tenía columnas. La cuestión es que no logro visualizar completamente/claramente en qué parte del proceso de creación del corpus se puede hacer. Sin embargo, siguiendo el ejemplo de

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

Hola, La referencia (gracias por proporcionarla) que has incluido es bastante clara y se puede seguir. ¿Has podido sobre tus dos discursos utilizar la misma lógica? La forma de salir de dudas, para empezar, es que adjuntaras el código que estás empleando por ver si hay algún error evidente. Aunque la forma adecuada para que te podamos ayudar es con un ejemplo reproducible: código + datos.

Finding minimum of time subset

2009 Aug 13

Finding minimum of time subset

Dear List, I have a data frame of data taken every few seconds. I would like to subset the data to retain only the data taken on the quarter hour, and as close to the quarter hour as possible. So far I have figured out how to subset the data to the quarter hour, but not how to keep only the minimum time for each quarter hour. For example:

wordcloud y tabla de palabras

2014 Jul 25

wordcloud y tabla de palabras

Buenas noches grupo. Saludos cordiales. He seguido en la búsqueda de una forma que me permita realizar la comparación de dos documentos pertenecientes a los años 2005 y 2013, y que pueda representar finalmente con wordcloud y con una table en la que las columnas sean los años de cada informe "2005" y "2013", y las filas sean las palabras con la frecuencia de cada una de ellas

remove Punctuation characters

2006 May 09

remove Punctuation characters

Hi, I want to remove all punctuation characters in a string. I was trying it use a regular expressions but it doesn't work. Here is a sample os what i want: str <- 'ABD - remove de punct, and dot characters.' str <- gsub('[:punct:]','',str) str "'ABD remove de punct and dot characters" is there any function that do this kind of thing? Thanks to

how can i use stopwords?

2008 Mar 12

how can i use stopwords?

Hi, I do not understand the stopword function... I've set the termgenerator like this: $self->{'Stemmer'} = new Search::Xapian::Stem(german2); $self->{'Stopper'} = new Search::Xapian::SimpleStopper(); $self->{'TermGenerator'} = new Search::Xapian::TermGenerator; $self->{'TermGenerator'}->set_stemmer( $self->{'Stemmer'} );

KMeans Clusterer - Going forward

2017 Jun 14

KMeans Clusterer - Going forward

Hello, I have finished moving the API to PIMPL classes and will fix issues within the current code over the next week, based on reviews from mentors. The next step going forward is to start with forming document vectors that are reduced and more useful. This majorly helps in saving run time (since time for distance calculation depends on number of terms). Getting the useful terms within a

ideas on picking stopwords

2009 Mar 26

ideas on picking stopwords

I'm looking at adding some stopwords to my indexing procedure, and was wondering if anyone had any good rules of thumb on how to pick which words to blacklist. It all seems a little... well... vague. Although I guess it kind of depends on the sort of documents you're wanting to index. My current idea is to write a little script to output the terms with the highest frequency in my

Implementing a "plugin" paradigm with R methods

2011 Aug 23

Implementing a "plugin" paradigm with R methods

Dear list, I was wondering how to best implement some sort of a "plugin" paradigm using R methods and the dispatcher: Say we have a function/method ('foo') that does something useful, but that should be open for extension in ONE specific area by OTHERS using my package. Of course they could go ahead and write a whole new 'foo' method including the features they'd

Stopword addition and stemming

2010 Nov 15

Stopword addition and stemming

Hi, Two questions which I'm unsure about: Stemming: I've turned on stemming, etc, but how can I confirm that it's being used in searches? What should I look/search for? Stopwords: I'm trying out xapian on a regional dataset (searching data from a *.co.us TLD, eg) . I've noticed that searching for [bob co.us] results in *very* slow search times (tens of seconds), since it

Does R support [:punct:] in regexps?

2009 Apr 09

Does R support [:punct:] in regexps?

Hello does R support [:punct:] in regular expressions? I am trying to strip all regular expressions for a vector of strings. > x <- c("yoda-yoda","billy!") > gsub("/[:punct:]/","",x) [1] "yoda-yoda" "billy!" Thanks Dan -- ************************************************************** Daniel Brewer, Ph.D. Institute of Cancer

regexp help needed

2008 Nov 28

regexp help needed

Hello, I have a vector of dates and I would like to grep the year component from this vector (= all digits after the last punctuation character) dates <- c("28.7.08","28.7.2008","28/7/08", "28/7/2008", "28/07/2008", "28-07-2008", "28-07-08") the resulting vector should look like "08" "2008"

Get a list of all terms in an indexed corpus

2010 Oct 08

Get a list of all terms in an indexed corpus

Hello, I have a corpus that I have indexed with xapian/xappy and I would now like to generate a corpus-specific list of stopwords. (This is a technical corpus, so a typical stopword list wouldn't be helpful.) My first thought was to ask the xapian database for a list of terms followed by their frequency. My intuition is that I could probably bring together a list of stopwords by examining

Stemming, stop words, acts_as_ferret

2006 Nov 13

Stemming, stop words, acts_as_ferret

I''d like to get the following behavior: 1. Stemming. The search is on a database of summaries of California legal cases. Things like a search for "thermal image" needs to hit "thermal imaging." 2. Stop words. Searches for "failing to instruct the jury" should come up with hits on a search for "fail to instruct." 3. Case-insensitive. What I

A way to get all the words from an index?

2007 May 30

A way to get all the words from an index?

Hi, I am just wondering if there''s a way to get all the words from an index. Basically, all the words that have been indexed (excluding the stopwords if I''m using the stopwords analyzer, etc.) The fields I''m putting in are not :stored in the index. The idea is to implement a "did you mean?" mecanism, which is based on the content of the index, not on a

similar to: How can this code be improved?