Displaying 20 results from an estimated 1000 matches similar to: "Extracting information from text data"
2009 Jan 10
1
Help needed for Loading "tm" package
Howdy Gurus again
Thanks to Tony.Breyal, I was able to writing the following script for
analyzing a text document.
But I got an error with "tm' package. I don't why I got the error from the R
script below. I think I followed proccess of R tm manual.
I use R v2.8.1. and tm_0.3-3.zip under Win XP.
Thanks in advance,
Kum Hwang
> # setting directory
> my.path
2009 Jan 15
1
How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
Hi, Gurus
Thanks to your good helps, I have managed starting the use of a text
mining package so called "tm" in R under the OS of Win XP.
However, during running the tm package, I got another mine like memory problem.
What is a the best way to solve this memory problem among increasing a
physical RAM, or doing other recipes, etc?
###############################
###### my R
2010 Apr 23
2
Library (tm) Error: could not find function "TermDocMatrix".
Hi List
I have the next code and the error. I have try with other codes and I have
the same problem.
> reut21578 <- system.file("texts", "crude", package = "tm")
> (r <- Corpus(DirSource(reut21578), readerControl = list(reader =
> readReut21578XMLasPlain)))
A corpus with 20 text documents
> (r <- Corpus(DirSource(reut21578), readerControl =
2009 Oct 02
1
text mining
The following code is derived from a paper titled "Text Mining Infrastructure
in R" (http://www.jstatsoft.org/v25/i05/paper). The example below seems to
load some default documents for analysis, some sort of latin document. I
cannot for the life of me figure out to load my own document let alone an
entire corpus. I have searched the above documenet as well as related
documentation.
2009 Mar 30
1
Help with tm assocation analysis and Rgraphviz installation.
Help with tm assocation analysis and Rgraphviz installation.
THANK YOU IN ADVANCE
Question 1:
I saved two txt file in C:\textfile
And each txt file contents only one text column, and both have 100 records.
I know term “research” occurs 49 times, so I want to find out which other
words are correlated to this word, and I got tons of association ‘1’ .
I tried other terms, and no
2009 Jan 09
1
[R} how to build TermDocMatrix in tm text mining package of R
Howdy Gurus
I 'd like to ask a question about how to build TermDocMatrix in tm text
mining package.
It is not clear about importing a plain text file, and them converting that
text file into TermDocMatrix file, etc to me.
How can I build a TermDocMatrix of " a plain text document file for text
association?
Or are there any good manuals?
Thank you in advance,
--
Kum-Hoe Hwang, Ph.D.
2009 Oct 15
1
Problems with rJava and tm packages
I am looking to do some text analysis using R and have run into some issues
with some of the packages. Im not sure if its my goofy Vista OS or what but
using R 2.8.1 i s relatively successful loading the text but the rJava
package was messed up somehow:
library(tm)
> library(rJava)
Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be
determined from the
2009 Apr 17
0
question about the Text Mining package tm
Hello. I am trying to work with the text mining package tm.
I have a directory called textsTweet1 which contains three files
short.txt
myTextFile.txt
myTextFile.csv
short.txt contains one line: THE CAT IN THE HAT\n
myTextFile contains some tweets from Twitter. The first few lines of
myTextFile.txt are:
@oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I
NEED TO GET ON
2009 Oct 13
0
tm: Why does adding local metadata take so long?
I'm running tm 0.5 on R 2.9.2 on a MacBook Pro 17" unibody early 2009
2.93 GHz 4GB RAM. I have a directory with 1697 plain text files on
the Mac, that I want to analyze with the tm package. I have read the
documents into a corpus, Corpus_3compounds, as follows:
# Assign directory to a character vector
dirName <- "/Volumes/RDR Test Documents/3Compounds/TXT"
# Put the
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
Hello,
I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example:
> path <- c("c:\path\to\directory") # collection of plain text documents
> corp <-
2013 Sep 26
0
R hangs at NGramTokenizer
Hi:
I try to construct a Document-Term Meatrix from a corpus. The commands I used are:
> library(parallel)> library(tm)> library(RWeka)> library(topicmodels)> library(RTextTools)> cl=makeCluster(detectCores())> invisible(clusterEvalQ(cl, library(tm)))> invisible(clusterEvalQ(cl, library(RWeka))) > invisible(clusterEvalQ(cl, library(topicmodels)))>
2011 Feb 10
2
Help using "tm" text mining package - preprocessing
Thanks all for your help. I fear text mining is an abstract little corner of
"R".
I have imported 3228 text (.txt) files, each a news story, into R using
[tm]:
textd <- Corpus(DirSource("other/docs"), readerControl = list(reader
=readPlain))
I can pre-process each individual document using tolower(textd[[1]])
however, when I try to run tmTolower() I get a no such command
2012 May 29
1
package tm: reading XML files
Dear fellow R users,
I'm using the package tm for text mining, and have a problem with
reading in a corpus from XML files.
When I copy the example from "Introduction to the tm package" of the
small reuters subset "crude", everything goes well, and I get a corpus
with the required meta data.
When I read in the entire reuters21578 corpus in XML format however (or
a
2017 Nov 07
0
Error when attempting to see "Corpus" metadata
R Project
I receive the erro highlighted in yellow when attempting to combine two
Corpus, so I'm assuming I'm not combining the two variables (nb_pos and
nb_neg) in the following line
nb_all <- c(nb_pos,nb_neg,recursive=TRUE) # anyone see anything wrong with
this line of code
--------------------------------------------------
> library("tm")
Loading
2008 Jan 07
1
glibc detected *** /usr/lib64/R/bin/exec/R: double free or corruption ???? tm package
Hi,
I have a collection of .txt documents in my working folder for which I want to do some text mining. If I run TextDocCol from the tm package, R crashes with some memory issues. Does anyone has any idea if this is related to R itself or to the tm package?
Below you can find what is happening here.
> setwd("/home/jan/Work/2008/Profacts/textmining/tryouts/workfolder")
>
2011 May 26
3
text mining
Hi,
how can I import a document whose type is. "txt" using the package tm?
it is the command to know that my document is not placed in the library
package tm.
thanks.
--
View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html
Sent from the R help mailing list archive at Nabble.com.
2010 Jan 22
1
Invalid input error in tm package
Hello,
I am working on "tm" package.
I have 2 pdf files saved in the directory D:/Files
I issued the following commands (marked in red bold) for which I got some
errors and warnings (marked in bold)
*surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language =
"ansi"))*
*Warning messages:
1: In readLines(y, encoding = x$Encoding) :
incomplete final
2009 Jul 17
3
Ayuda con el paquete de text mining (TM)
Estimados, les escribo para consultar, lo siguiente:
Estoy haciendo un trabajo de text mining y necesito importar una serie de
textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming,
eliminar signos de puntuación etc. Esto último lo puedo realizar con los
datasets que trae la librería TM. Lo que no puedo lograr es importar texto
desde algún medio a pesar que existe funciones
2010 Feb 16
0
tm package
Hi,
I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me
as if after the following
reuters21578 <- Corpus(DirSource(corpusDir), readerControl =
list(reader = readReut21578XMLasPlain))
reuters21578 <- tm_map(reuters21578, stripWhitespace)
reuters21578 <- tm_map(reuters21578, tolower)
reuters21578 <- tm_map(reuters21578, removePunctuation)
2011 Jun 27
0
Extracting certain text using tm package
I have used "tm" package to import a set of text documents using the
following command:
text <- Corpus(DirSource("."),readerControl = list(language ="ansi"))
I would like to extract only a certain portion of the text in each document
using certain keywords. For example, I would like to include all the text
between key words <Start Text> and <End