Displaying 20 results from an estimated 10000 matches similar to: "non-english text mining with tm package"
2010 Oct 06
1
How to read a matrix with Hebrew row names?
Hello all,
I am trying to read a matrix with row names in Hebrew, but I am unable to read
the Hebrew words, e.g.:
להחזיק -1.544317e-02 -2.398621e-01
9.854603e-01 1.111321e+00
ש×חרי -1.544317e-02 -2.398621e-01 7.421092e-01
3.439690e-01
היישוב -1.544317e-02 -2.398621e-01
1.050982e+00 1.125970e+00
×‘×ª×™× -1.544317e-02
2009 Jan 09
1
[R} how to build TermDocMatrix in tm text mining package of R
Howdy Gurus
I 'd like to ask a question about how to build TermDocMatrix in tm text
mining package.
It is not clear about importing a plain text file, and them converting that
text file into TermDocMatrix file, etc to me.
How can I build a TermDocMatrix of " a plain text document file for text
association?
Or are there any good manuals?
Thank you in advance,
--
Kum-Hoe Hwang, Ph.D.
2009 Jul 17
3
Ayuda con el paquete de text mining (TM)
Estimados, les escribo para consultar, lo siguiente:
Estoy haciendo un trabajo de text mining y necesito importar una serie de
textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming,
eliminar signos de puntuación etc. Esto último lo puedo realizar con los
datasets que trae la librería TM. Lo que no puedo lograr es importar texto
desde algún medio a pesar que existe funciones
2009 Oct 02
1
text mining
The following code is derived from a paper titled "Text Mining Infrastructure
in R" (http://www.jstatsoft.org/v25/i05/paper). The example below seems to
load some default documents for analysis, some sort of latin document. I
cannot for the life of me figure out to load my own document let alone an
entire corpus. I have searched the above documenet as well as related
documentation.
2011 Feb 10
2
Help using "tm" text mining package - preprocessing
Thanks all for your help. I fear text mining is an abstract little corner of
"R".
I have imported 3228 text (.txt) files, each a news story, into R using
[tm]:
textd <- Corpus(DirSource("other/docs"), readerControl = list(reader
=readPlain))
I can pre-process each individual document using tolower(textd[[1]])
however, when I try to run tmTolower() I get a no such command
2009 Apr 17
0
question about the Text Mining package tm
Hello. I am trying to work with the text mining package tm.
I have a directory called textsTweet1 which contains three files
short.txt
myTextFile.txt
myTextFile.csv
short.txt contains one line: THE CAT IN THE HAT\n
myTextFile contains some tweets from Twitter. The first few lines of
myTextFile.txt are:
@oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I
NEED TO GET ON
2011 May 18
0
text mining problem using TM package
Hi, I’m using R (TM package) for text mining and I’m having problems
filtering articles out of my data set by local meta data.
Here is the code:
*data <- ("C:/… /19970331")*
* *
* *
*rs <- ReutersSource(data , encoding = "UTF-8")*
*RC <- VCorpus(DirSource(data), readerControl = list(reader =
readRCV1asPlain,*
*
language = "en_US",*
*
load =
2012 May 29
1
package tm: reading XML files
Dear fellow R users,
I'm using the package tm for text mining, and have a problem with
reading in a corpus from XML files.
When I copy the example from "Introduction to the tm package" of the
small reuters subset "crude", everything goes well, and I get a corpus
with the required meta data.
When I read in the entire reuters21578 corpus in XML format however (or
a
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
Hola a todos!
Tengo algunos problemas con el tamaño de la matriz de términos que obtengo. Los comandos que utilizo son los siguientes:
# carga librerias
library(tm)
library(wordcloud)
library(Rstem)
library(Snowball)
# lee el documento UTF-8 y lo convierte a ASCII
txt <-
2014 Jun 17
2
No es un problema de tm tienes doc.corpus vacío
No es un problema de tm ni de SnowfallC ni de mcapply (por el path
utilizas linux, en windows mcapply según el manual no va bien)
No defines bien los objetos que pasas. Pasas doc.corpus en lugar de
corpus ( o asignas a corpus en lugar de a doc.corpus) .
Depura los programas cuando salga un error de objeto, como te pone en el
Error que pasas .
Temporalmente lo tienes arreglado en
2014 Jun 18
3
No es un problema de tm tienes doc.corpus vacío
Muchas gracias isidro,
a la noche reinstalo R y os digo si me ha funcionado. Perdona mi ignorancia
de novato pero no he entendido muy bien eso de avisar al desarrollador.
Entiendo que es a los de los paquetes, no?
un saludo!
ruben
El 18 de junio de 2014, 13:10, Isidro Hidalgo <ihidalgo@jccm.es> escribió:
> Ya he visto que tampoco así funciona.
> Sí te puedo decir que me ha dejado
2014 Jun 18
2
No es un problema de tm tienes doc.corpus vacío
Creo que lo que quieres hacer necesita esta línea de código justo después de
cargar el paquete tm:
inmortal = unlist(strsplit(inmortal, " ", fixed = T))
De esta forma, trabajas con palabras, y NO con las frases enteras...
Un saludo
Isidro Hidalgo Arellano
Observatorio Regional de Empleo
Consejería de Empleo y Economía
http://www.jccm.es
> -----Mensaje original-----
> De:
2011 May 26
3
text mining
Hi,
how can I import a document whose type is. "txt" using the package tm?
it is the command to know that my document is not placed in the library
package tm.
thanks.
--
View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html
Sent from the R help mailing list archive at Nabble.com.
2010 Apr 23
2
Library (tm) Error: could not find function "TermDocMatrix".
Hi List
I have the next code and the error. I have try with other codes and I have
the same problem.
> reut21578 <- system.file("texts", "crude", package = "tm")
> (r <- Corpus(DirSource(reut21578), readerControl = list(reader =
> readReut21578XMLasPlain)))
A corpus with 20 text documents
> (r <- Corpus(DirSource(reut21578), readerControl =
2009 Mar 17
1
- help - predicting with glmnet/lars for dataframes with different nrow then the train set
Hello
I'm having trouble using lars and glmnet functions to predict on a new data
set with different nrow then the original :
for instance:
=============
log.1 = glm(temp.data$TL~(.),temp.data,family = binomial,x=TRUE,y=TRUE)
nrow(test.data) != nrow(temp.data # == TRUE
Val.frame = model.frame(log.1,test.data) # returns a data frame with the
variables needed to use log.1
2011 May 25
0
text mining - text comparing
Hi all,
I'll try to explain what i would like to achieve.
I have two problmes that i would need help on if someone has a clue.
1.) I have a TXT file containing two fields : USCS and Description.
For each field of USCS I have a field Descrition that contained a lot of words that describe that particular USCS type. What i would like to do is tomine the text using tm
2010 Aug 17
1
TM Package - Corpus function - Memory Allocation Problems
I'm using R 2.11.1 on Win XP (32-bit) with 3 GB of RAM. My data has
(only) 16.0 MB.
I want to create a VCorpus object using the Corpus function in the tm
package but I'm running into Memory allocation issues: "Error: cannot
allocate vector of size 372 Kb".
My data is stored in a csv file which I've imported with "read.csv" and
then used the following to create
2009 Jan 10
1
Help needed for Loading "tm" package
Howdy Gurus again
Thanks to Tony.Breyal, I was able to writing the following script for
analyzing a text document.
But I got an error with "tm' package. I don't why I got the error from the R
script below. I think I followed proccess of R tm manual.
I use R v2.8.1. and tm_0.3-3.zip under Win XP.
Thanks in advance,
Kum Hwang
> # setting directory
> my.path
2011 Jan 24
1
Extracting information from text data
Hi R-Users,
Thanks in advance.
I am using R-2.12.0 on Windows XP.
I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm)
A. Using package tm
I am using package tm to do the job. I have provided the code below:
> my.corpus <- Corpus(DirSource(my.path),
2009 Oct 15
1
Problems with rJava and tm packages
I am looking to do some text analysis using R and have run into some issues
with some of the packages. Im not sure if its my goofy Vista OS or what but
using R 2.8.1 i s relatively successful loading the text but the rJava
package was messed up somehow:
library(tm)
> library(rJava)
Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be
determined from the