similar to: Error when attempting to see "Corpus" metadata

Displaying 20 results from an estimated 300 matches similar to: "Error when attempting to see "Corpus" metadata"

2009 Oct 02
1
text mining
The following code is derived from a paper titled "Text Mining Infrastructure in R" (http://www.jstatsoft.org/v25/i05/paper). The example below seems to load some default documents for analysis, some sort of latin document. I cannot for the life of me figure out to load my own document let alone an entire corpus. I have searched the above documenet as well as related documentation.
2010 Apr 23
2
Library (tm) Error: could not find function "TermDocMatrix".
Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl =
2011 Jan 24
1
Extracting information from text data
Hi R-Users,   Thanks in advance.   I am using R-2.12.0 on Windows XP.   I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm)   A. Using package tm   I am using package tm to do the job. I have provided the code below:   > my.corpus <- Corpus(DirSource(my.path),
2009 Jan 15
1
How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
Hi, Gurus Thanks to your good helps, I have managed starting the use of a text mining package so called "tm" in R under the OS of Win XP. However, during running the tm package, I got another mine like memory problem. What is a the best way to solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R
2009 Oct 13
0
tm: Why does adding local metadata take so long?
I'm running tm 0.5 on R 2.9.2 on a MacBook Pro 17" unibody early 2009 2.93 GHz 4GB RAM. I have a directory with 1697 plain text files on the Mac, that I want to analyze with the tm package. I have read the documents into a corpus, Corpus_3compounds, as follows: # Assign directory to a character vector dirName <- "/Volumes/RDR Test Documents/3Compounds/TXT" # Put the
2009 Jan 10
1
Help needed for Loading "tm" package
Howdy Gurus again Thanks to Tony.Breyal, I was able to writing the following script for analyzing a text document. But I got an error with "tm' package. I don't why I got the error from the R script below. I think I followed proccess of R tm manual. I use R v2.8.1. and tm_0.3-3.zip under Win XP. Thanks in advance, Kum Hwang > # setting directory > my.path
2009 Oct 15
1
Problems with rJava and tm packages
I am looking to do some text analysis using R and have run into some issues with some of the packages. Im not sure if its my goofy Vista OS or what but using R 2.8.1 i s relatively successful loading the text but the rJava package was messed up somehow: library(tm) > library(rJava) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determined from the
2013 Sep 26
0
R hangs at NGramTokenizer
Hi: I try to construct a Document-Term Meatrix from a corpus. The commands I used are: > library(parallel)> library(tm)> library(RWeka)> library(topicmodels)> library(RTextTools)> cl=makeCluster(detectCores())> invisible(clusterEvalQ(cl, library(tm)))> invisible(clusterEvalQ(cl, library(RWeka))) > invisible(clusterEvalQ(cl, library(topicmodels)))>
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
Hello, I want to use the SnowballStemmer on a collection of plain text documents. However, when I apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all).  An example: > path <- c("c:\path\to\directory")       # collection of plain text documents > corp <-
2010 Feb 16
0
tm package
Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation)
2011 Jun 27
0
Extracting certain text using tm package
I have used "tm" package to import a set of text documents using the following command: text <- Corpus(DirSource("."),readerControl = list(language ="ansi")) I would like to extract only a certain portion of the text in each document using certain keywords. For example, I would like to include all the text between key words <Start Text> and <End
2009 Apr 17
0
question about the Text Mining package tm
Hello. I am trying to work with the text mining package tm. I have a directory called textsTweet1 which contains three files short.txt myTextFile.txt myTextFile.csv short.txt contains one line: THE CAT IN THE HAT\n myTextFile contains some tweets from Twitter. The first few lines of myTextFile.txt are: @oliviamunn I miss a good Yakaniku...I miss Japan...I NEED COCO EVERYBODY. I NEED TO GET ON
2009 Dec 22
0
Reading PDF files (using xpdf)
Greetings Zaki, You should really post this question on the R-help forum so that others might benefit from any responses. It's been a while since I've done this, but if memory serves, the basic process was to download xpdf and add it to the windows path, thus making it accessable from within R. Two methods follow: Method One (easiest) - using the awesome ?system command: (1) Download
2010 Jan 22
1
Invalid input error in tm package
Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold) *surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final
2011 Feb 10
2
Help using "tm" text mining package - preprocessing
Thanks all for your help. I fear text mining is an abstract little corner of "R". I have imported 3228 text (.txt) files, each a news story, into R using [tm]: textd <- Corpus(DirSource("other/docs"), readerControl = list(reader =readPlain)) I can pre-process each individual document using tolower(textd[[1]]) however, when I try to run tmTolower() I get a no such command
2011 May 18
0
text mining problem using TM package
Hi, I’m using R (TM package) for text mining and I’m having problems filtering articles out of my data set by local meta data. Here is the code: *data <- ("C:/… /19970331")* * * * * *rs <- ReutersSource(data , encoding = "UTF-8")* *RC <- VCorpus(DirSource(data), readerControl = list(reader = readRCV1asPlain,* * language = "en_US",* * load =
2009 Nov 03
1
Can't pass file name as parameter to Corpus function
I'm working on a small project to extract high-frequency terms from a document and then display those terms in web page. To this end, I've to pass the file name as parameter to the Corpus function to build a corpus of only one document. I can build the corpus using the code below interactively in R. But calling the function with a file name as the parameter I got the error message saying
2009 Mar 30
1
Help with tm assocation analysis and Rgraphviz installation.
Help with tm assocation analysis and Rgraphviz installation. THANK YOU IN ADVANCE Question 1: I saved two txt file in C:\textfile And each txt file contents only one text column, and both have 100 records. I know term “research” occurs 49 times, so I want to find out which other words are correlated to this word, and I got tons of association ‘1’ . I tried other terms, and no
2012 May 29
1
package tm: reading XML files
Dear fellow R users, I'm using the package tm for text mining, and have a problem with reading in a corpus from XML files. When I copy the example from "Introduction to the tm package" of the small reuters subset "crude", everything goes well, and I get a corpus with the required meta data. When I read in the entire reuters21578 corpus in XML format however (or a
2008 Jan 07
1
glibc detected *** /usr/lib64/R/bin/exec/R: double free or corruption ???? tm package
Hi, I have a collection of .txt documents in my working folder for which I want to do some text mining. If I run TextDocCol from the tm package, R crashes with some memory issues. Does anyone has any idea if this is related to R itself or to the tm package? Below you can find what is happening here. > setwd("/home/jan/Work/2008/Profacts/textmining/tryouts/workfolder") >