thr3ads.net - similar to: "How to read plain text documents into a vector?"

Displaying 20 results from an estimated 3000 matches similar to: "How to read plain text documents into a vector?"

2015 Apr 10

Loop sobre muchos data frames

Hola a todos! Estoy en un proyecto de text mining y por razones de los recursos con que cuento tuve que separar los archivos de texto de input del proyecto en muchos archivos pequeños. Luego de transformar cada uno de estos archivos en un corpus separado, puedo aplicar limpieza sobre cada corpus, buscar n-gramas, construir cada termDocumentMatrix y finalmente reunir todo en una sola TDM. Pero

text mining

2009 Oct 02

text mining

The following code is derived from a paper titled "Text Mining Infrastructure in R" (http://www.jstatsoft.org/v25/i05/paper). The example below seems to load some default documents for analysis, some sort of latin document. I cannot for the life of me figure out to load my own document let alone an entire corpus. I have searched the above documenet as well as related documentation.

Loop sobre muchos data frames

2015 Apr 10

Loop sobre muchos data frames

Jorge Gracias por el consejo. Aparentemente no lo estoy aplicando bien, pues el objeto que obtengo no contiene lo que quiero. Me explico, al ejecutar txt <- vector('list', length = length(names)) #names el el vector donde ya tenía almacenada la lista de txt's for(i in seq_along(txt)){ txt[[i]] <- Corpus(VectorSource(names[i])) } obtengo el objeto txt: > class(txt) [1]

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

2009 Jan 15

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

Hi, Gurus Thanks to your good helps, I have managed starting the use of a text mining package so called "tm" in R under the OS of Win XP. However, during running the tm package, I got another mine like memory problem. What is a the best way to solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R

Loop sobre muchos data frames

2015 Apr 12

Loop sobre muchos data frames

Jorge, estimados colaboradores de R-help Estuve tratando de utilizar un script para uno de los pasos en mi análisis, que es transformar cada uno de los corpus en mi espacio de trabajo en un objeto TermDocumentMatrix Tengo un vector llamado bNames que lista todos los corpus que quiero pasar a TDM, y construí los siguientes comandos: tdm.n1 <- vector('list', length = length(bNames))

text mining

2011 May 26

text mining

Hi, how can I import a document whose type is. "txt" using the package tm? it is the command to know that my document is not placed in the library package tm. thanks. -- View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html Sent from the R help mailing list archive at Nabble.com.

[R} how to build TermDocMatrix in tm text mining package of R

2009 Jan 09

[R} how to build TermDocMatrix in tm text mining package of R

Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.

package tm: reading XML files

2012 May 29

package tm: reading XML files

Dear fellow R users, I'm using the package tm for text mining, and have a problem with reading in a corpus from XML files. When I copy the example from "Introduction to the tm package" of the small reuters subset "crude", everything goes well, and I get a corpus with the required meta data. When I read in the entire reuters21578 corpus in XML format however (or a

Stemming functions only work on the last word of plain text documents

2011 Sep 05

Stemming functions only work on the last word of plain text documents

Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example: > path <- c("c:\path\to\directory") # collection of plain text documents > corp <-

malloc errors? out of memory with many files on HP-UX

2003 Nov 08

malloc errors? out of memory with many files on HP-UX

Hi, folks. I've started getting these errors from rsync, and any help would be appreciated: >ERROR: out of memory in string_area_new buffer >rsync error: error allocating core memory buffers (code 22) at util.c(115) >ERROR: out of memory in string_area_new buffer >rsync error: error allocating core memory buffers (code 22) at util.c(115) >ERROR: out of memory in

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

Hola, La referencia (gracias por proporcionarla) que has incluido es bastante clara y se puede seguir. ¿Has podido sobre tus dos discursos utilizar la misma lógica? La forma de salir de dudas, para empezar, es que adjuntaras el código que estás empleando por ver si hay algún error evidente. Aunque la forma adecuada para que te podamos ayudar es con un ejemplo reproducible: código + datos.

Reading Data from mle into excel?

2011 May 23

Reading Data from mle into excel?

Hi there, I ran the following code: vols=read.csv(file="C:/Documents and Settings/Hugh/My Documents/PhD/Swaption vols.csv" , header=TRUE, sep=",") X<-ts(vols[,2]) #X dcOU<-function(x,t,x0,theta,log=FALSE){ Ex<-theta[1]/theta[2]+(x0-theta[1]/theta[2])*exp(-theta[2]*t) Vx<-theta[3]^2*(1-exp(-2*theta[2]*t))/(2*theta[2]) dnorm(x,mean=Ex,sd=sqrt(Vx),log=log) }

rsync out of memory at 8 MB although ulimit is 512MB

2007 Aug 27

rsync out of memory at 8 MB although ulimit is 512MB

Hello again, I encountered something amazing. First I thought there is not enough memory allowed through ulimit. ulimit is now set to (almost) 512MB but rsync still gets out fo memory at 8MB. Can anyone tell me why? That's my configuration: rsync version 2.6.2 from AIX 5.3 to SuSE Linux 9 (also has rsync 2.6.2) ulimit -a (AIX) ulimit -a AIX (source): -------------------------

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

2014 Jul 22

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

Buenas tardes, grupo. Estoy tratando de hacer la comparación de dos archivos de una misma organización para encontrar las diferencias entre su informe del tema edl año 2005 y el del año 2013: Todos los comandos van bien, a exepción del último "colnames", como se ve en la siguiente secuencia: > pdf1<-"./PLAN de INSPECCIONES/05_seguridad_ciudadana.pdf" >

rsync hangs after aborting a process

2008 Aug 19

rsync hangs after aborting a process

Greetings. In testing an rsync backup script I'd created, I made a mistake and aborted the running script with a ctrl-C keyboard interrupt. The command that was running at the time was as follows: ${RSYNC_CMD} -aNHAXx --protect-args --fileflags --force-change --rsync-path="/usr/local/bin/rsync" <username>@<my.server.com>:${CPY_SRC} ${CPY_DEST} The expected data

using package tm to find phrases

2009 Aug 13

using package tm to find phrases

I am using the package "tm" for text-mining of abstracts and would like to use it to find instances of gene names that may contain white space. For instance "gene regulatory protein 1". The default behavior of tm is to parse this into 4 separate words, but I would like to use the class constructor "dictionary" to define phrases such as just mentioned. Is this

rsync limit to file size/file count

2003 Jun 11

rsync limit to file size/file count

Hi, What are the limits to file size and file count when doing a rsync transfer using 2.5.6? I was trying to rsync about 500 GB of data with many files and many directories, but it has been stuck building the file list for several hours. First of all, is it possible to transfer 500 GB of data? Secondly, what would the limit for file count be when doing a rsync transfer? Any comments or help

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

Buenas tardes grupo. Saludos cordiales Carlos J., muchas gracias por tu orientación. Efectivamente, me había dado cuenta que la razón por la que no se aplicaba colnames era porque no tenía columnas. La cuestión es que no logro visualizar completamente/claramente en qué parte del proceso de creación del corpus se puede hacer. Sin embargo, siguiendo el ejemplo de

Creating a list of combinations

2009 Aug 20

Creating a list of combinations

Dear R Users, I have 120 objects stored in R's memory and I want to pass the names of these many objects to be held as just one single object. The naming convention is month, year in sequence for all months between January 1986 to December 1995 (e.g. Jan86, Feb86, Mar86... through to Dec95). I hope to pass all these names (and their data I guess) to an object called file_list, however,

similar to: How to read plain text documents into a vector?