thr3ads.net - similar to: "Help with cleaning a corpus"

Displaying 20 results from an estimated 400 matches similar to: "Help with cleaning a corpus"

2013 Sep 26

R hangs at NGramTokenizer

Hi: I try to construct a Document-Term Meatrix from a corpus. The commands I used are: > library(parallel)> library(tm)> library(RWeka)> library(topicmodels)> library(RTextTools)> cl=makeCluster(detectCores())> invisible(clusterEvalQ(cl, library(tm)))> invisible(clusterEvalQ(cl, library(RWeka))) > invisible(clusterEvalQ(cl, library(topicmodels)))>

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

I am using code that previously worked to remove stopwords using package "tm". Even manually adding "the" to the list does not work to remove "the". This package has undergone extensive redevelopment with changes to the function syntax, so perhaps I am just missing something. Please see my simple example, output, and sessionInfo() below. Thanks! Mark require(tm)

No es un problema de tm tienes doc.corpus vacío

2014 Jun 17

No es un problema de tm tienes doc.corpus vacío

No es un problema de tm ni de SnowfallC ni de mcapply (por el path utilizas linux, en windows mcapply según el manual no va bien) No defines bien los objetos que pasas. Pasas doc.corpus en lugar de corpus ( o asignas a corpus en lugar de a doc.corpus) . Depura los programas cuando salga un error de objeto, como te pone en el Error que pasas . Temporalmente lo tienes arreglado en

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

Muchas gracias isidro, a la noche reinstalo R y os digo si me ha funcionado. Perdona mi ignorancia de novato pero no he entendido muy bien eso de avisar al desarrollador. Entiendo que es a los de los paquetes, no? un saludo! ruben El 18 de junio de 2014, 13:10, Isidro Hidalgo <ihidalgo@jccm.es> escribió: > Ya he visto que tampoco así funciona. > Sí te puedo decir que me ha dejado

tm package

2010 Feb 16

tm package

Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation)

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

Creo que lo que quieres hacer necesita esta línea de código justo después de cargar el paquete tm: inmortal = unlist(strsplit(inmortal, " ", fixed = T)) De esta forma, trabajas con palabras, y NO con las frases enteras... Un saludo Isidro Hidalgo Arellano Observatorio Regional de Empleo Consejería de Empleo y Economía http://www.jccm.es > -----Mensaje original----- > De:

Troubles with stemming (tm + Snowball packages) under MacOS

2012 Jan 13

Troubles with stemming (tm + Snowball packages) under MacOS

Dear all, I have some troubles using the stemming algorithm provided by the tm (text mining) + Snowball packages. Here is my config: MacOS 10.5 R 2.12.0 / R 2.13.1 / R 2.14.1 (I have tried several versions) I have installed all the needed packages (tm, rJava, rWeka, Snowball) + dependencies. I have desactivated AWT (like written in

tm_map help

2012 Feb 26

tm_map help

Hi all, I am trying to do some text mining with twitter and I am getting the error: Error in structure(names(sapply(possibleCompletions, "[", 1)), names = x) : 'names' attribute [1] must be the same length as the vector [0] When I use tm_map. Has anyone had/seen this error before? The code I have is shown below and this error only occurs with #qantas, hashtags like #asx,

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

Buenas tardes grupo. Saludos cordiales Carlos J., muchas gracias por tu orientación. Efectivamente, me había dado cuenta que la razón por la que no se aplicaba colnames era porque no tenía columnas. La cuestión es que no logro visualizar completamente/claramente en qué parte del proceso de creación del corpus se puede hacer. Sin embargo, siguiendo el ejemplo de

Minería de texto

2012 Oct 25

Minería de texto

Cordial Saludo Actualmente estoy realizando una función para gráficar una nube de palabras el código que tengo es el siguiente: library(twitteR)library(tm)library(wordcloud)library(RXKCD)library(RColorBrewer) tweets=searchTwitter(''@afflorezr'', n=1500) generateCorpus= function(tweets,my.stopwords=c(),min.freq){ #Install the textmining library require(tm) require(wordcloud)

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

2014 Jul 22

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

Buenas tardes, grupo. Estoy tratando de hacer la comparación de dos archivos de una misma organización para encontrar las diferencias entre su informe del tema edl año 2005 y el del año 2013: Todos los comandos van bien, a exepción del último "colnames", como se ve en la siguiente secuencia: > pdf1<-"./PLAN de INSPECCIONES/05_seguridad_ciudadana.pdf" >

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

Hola a todos! Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes: # carga librerias library(tm) library(wordcloud) library(Rstem) library(Snowball) # lee el documento UTF-8 y lo convierte a ASCII txt <-

tm package: handling contractions

2012 Jan 27

tm package: handling contractions

I tried making a wordcloud of Obama's State of the Union address using the tm package to process the text sotu <- scan(file="c:/R/data/sotu2012.txt", what="character") sotu <- tolower(sotu) corp <-Corpus(VectorSource(paste(sotu, collapse=" "))) corp <- tm_map(corp, removePunctuation) corp <- tm_map(corp, stemDocument) corp <- tm_map(corp,

wordcloud y tabla de palabras

2014 Jul 25

wordcloud y tabla de palabras

Buenas noches grupo. Saludos cordiales. He seguido en la búsqueda de una forma que me permita realizar la comparación de dos documentos pertenecientes a los años 2005 y 2013, y que pueda representar finalmente con wordcloud y con una table en la que las columnas sean los años de cada informe "2005" y "2013", y las filas sean las palabras con la frecuencia de cada una de ellas

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

Hola, La referencia (gracias por proporcionarla) que has incluido es bastante clara y se puede seguir. ¿Has podido sobre tus dos discursos utilizar la misma lógica? La forma de salir de dudas, para empezar, es que adjuntaras el código que estás empleando por ver si hay algún error evidente. Aunque la forma adecuada para que te podamos ayudar es con un ejemplo reproducible: código + datos.

Problem with Snowball & RWeka

2011 Mar 24

Problem with Snowball & RWeka

Dear Forum, when I try to use SnowballStemmer() I get the following error message: "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException" It seems to have something to do with either Snowball or RWeka, however I can't figure out, what to do myself. If you could spend 5 minutes of your valuable time, to help me or give me a

count number of stop words in R

2017 Jun 12

count number of stop words in R

Defining data as you mentioned in your respond causes the following error: Error in UseMethod("tm_map", x) : no applicable method for 'tm_map' applied to an object of class "character" I can solve this error by using Corpus(VectorSource(my string)) and the using your command but I cannot see the number of stop words in my string! On Monday, June 12, 2017 8:36

count number of stop words in R

2017 Jun 12

count number of stop words in R

Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is

Added support for TfIdf to Omega

2013 Apr 11

Added support for TfIdf to Omega

Hello guys,I have added code for tfidf to the weight.cc file in omega/ . Here is the patch : - https://github.com/aarshkshah1992/xapian/commit/5ff41a15f574e6780cc61e67e7f3da3d97ff4ec8 It compiles well and I think it'll work well. Here's the link to the documentation file omegascript.rst where I've added tfidf.

Reading stopwords from a csv file

2011 Oct 04

Reading stopwords from a csv file

I am using the tm package to do text miniing: I have a huge list of stopwords (2000+) that are in a csv file. I read it as follows: stopwordlist <- read.csv("stopwords to be Removed 10042011.csv") myStopwords <- as.character(stopwordlist$stopwords) When try removing the stopwords using tr1=tm_map(tr1,removeWords,myStopwords) I am getting the following error: Error in

similar to: Help with cleaning a corpus