search for: readercontrol

Displaying 20 results from an estimated 33 matches for "readercontrol".

2009 Oct 02
1
text mining
...ument let alone an entire corpus. I have searched the above documenet as well as related documentation. Any leads or help would be appreciated. Thanks everyone from document txt <- system.file("texts", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE))) my attempt txt <- system.file("Speeches/speech", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE)))...
2010 Apr 23
2
Library (tm) Error: could not find function "TermDocMatrix".
Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain)))> > > summary(r) A corpus with 20 text documents The metadata consists of 2 tag-value pairs and a data...
2011 Jan 24
1
Extracting information from text data
...ce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm)   A. Using package tm   I am using package tm to do the job. I have provided the code below:   > my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain))   In readLines(y, encoding = x$Encoding) :   incomplete final line found on 'M:\textmine/slr.txt'   > x <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix"   B. Using package(s) other than tm    Once again, thank you very mu...
2009 Jan 15
1
How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
...o solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus) > findAssocs(tdm, "city", 0.97) error:cannot allocate ve...
2011 May 26
3
text mining
Hi, how can I import a document whose type is. "txt" using the package tm? it is the command to know that my document is not placed in the library package tm. thanks. -- View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html Sent from the R help mailing list archive at Nabble.com.
2017 Nov 07
0
Error when attempting to see "Corpus" metadata
...) in the following line nb_all <- c(nb_pos,nb_neg,recursive=TRUE) # anyone see anything wrong with this line of code -------------------------------------------------- > library("tm") Loading required package: NLP > nb_pos <- Corpus(DirSource(path_to_pos_folder), readerControl = list(language="en")) # appears to be correct > nb_neg <- Corpus(DirSource(path_to_neg_folder), readerControl = list(language="en")) # appears to be correct > nb_all <- c(nb_pos,nb_neg,recursive=TRUE) > > meta(nb_all[[1]]) Error in UseMethod(&quot...
2009 Nov 03
1
Can't pass file name as parameter to Corpus function
...code below interactively in R. But calling the function with a file name as the parameter I got the error message saying "Error in eval(expr, envir, enclos) : object 'strFileName' not found" test<-function(strFileName) { src <- URISource(strFileName) cor <- Corpus(src, readerControl = list(reader = readPDF, language = "en_US", load = TRUE)) } After running the following code in R I checked the docURISource$URI and the value is "strFileName" rather than "C:\\Temp\\readme.txt". I also checked the URI when I was debugging the function and the URI is...
2009 Jan 10
1
Help needed for Loading "tm" package
...e("jar", c("weka.jar", "RWeka.jar"), package = pkgname, : Cannot create Java virtual machine (-1) Error : .onLoad failed in 'loadNamespace' for 'RWeka' Error: package 'RWeka' could not be loaded > my.corpurs <-Corpus(DirSource(my.path), readerControl = list(reader=readPlain)) Error: could not find function "Corpus" > my.tdm <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" > my.tdm[1,] Error: object "my.tdm" not found -- Kum-Hoe Hwang, Ph.D. Phone : 82-31-250-3516 Email : phdhw...
2009 Oct 15
1
Problems with rJava and tm packages
...der R version 2.9.1 Error : .onLoad failed in 'loadNamespace' for 'rJava' Error: package/namespace load failed for 'rJava' > > #Set documents directory > DIR <- "G:/TextSearch/Speeches" > > #Load corpus > speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain, + language = "en_US", load = TRUE)) > > #Remove stopwords > speech <- tmMap(speech, stripWhitespace) > speech A corpus with 2 text documents > tdm<-TermDocumentMatrix(speech) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set...
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
...orpus based on the contents of just one variable, and I construct the corpus from a VectorSource. When I run inspect, all seems fine and I can see the text properly, with unicode characters present: data.corpus<-Corpus(VectorSource(data$variable,encoding='UTF-8'), readerControl=list(language='bulgarian')) However, no matter what I do - like which encoding I select - UTF-8 or CP1251, which is the typical code page for Bulgarian texts, I cannot get to remove the stop words from my corpus. The issue is present in both Linux and Windows, and across the computers I...
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
...orpus based on the contents of just one variable, and I construct the corpus from a VectorSource. When I run inspect, all seems fine and I can see the text properly, with unicode characters present: data.corpus<-Corpus(VectorSource(data$variable,encoding='UTF-8'), readerControl=list(language='bulgarian')) However, no matter what I do - like which encoding I select - UTF-8 or CP1251, which is the typical code page for Bulgarian texts, I cannot get to remove the stop words from my corpus. The issue is present in both Linux and Windows, and across the computers I...
2009 Mar 30
1
Help with tm assocation analysis and Rgraphviz installation.
...I got tons of association ‘1’ . I tried other terms, and no association value is less than 1, which obviously is wrong. Could any export tell me where did I do wrong? My R-code is: R>my.path<-'C:\\textfile' R>library(tm) R>my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) R>tdmO <- TermDocMatrix(my.corpus) R>tdmO An object of class “TermDocMatrix” Slot "Data": 2 x 1426 sparse Matrix of class "dgCMatrix" [[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]] 1 3 1 12 1 1 1 8 1 1 2 1 9 . 2...
2010 Jan 22
1
Invalid input error in tm package
Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold) *surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf' 2: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'D:/Files/provmedsurgj00978-0007.pdf...
2011 Apr 06
0
Curious treatment of entities in xmlTreeParse
...Document()) rss2Source <- function(x, encoding = "UTF-8") XMLSource(x, function(tree) XML::getNodeSet(XML::xmlRoot(tree),"/rss/channel/item"), rss2Reader, encoding) feed.rss2 <- rss2Source(url("http://scottbw.wordpress.com/feed/")) corp1<-Corpus(feed.rss2, readerControl=list(language="en")) I've googled around for this problem but got nowhere. Have I missed something? Any help will be received gratefully; this was supposed to be the easy part! Cheers, Adam
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
...orpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all).  An example: > path <- c("c:\path\to\directory")       # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are:   create_date creator Available variables in the data frame are:   MetaID $`1.txt` running runs...
2009 Jan 09
1
[R} how to build TermDocMatrix in tm text mining package of R
Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.
2009 Apr 17
0
question about the Text Mining package tm
...new lines in the original file but I inserted a new line before every occurrence of http. I ran the following code: library("tm") my.path <- 'C:\\dataForR\\textsTweet1\\' my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv' (ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain, language = "la"))) Response from R: A text document collection with 3 text documents Warning message: In readLines(filename, encoding = encoding) : incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt' Then I ran the TermDocMatrix funct...
2009 Aug 17
2
reading in MS Word files
I am familiar with packages that read and write Excel files on both Windows and Linux platforms. Do any packages provide similar functionality for MS Word files? I have a lot of text processing to do and the text is embedded in ~200 different Word files (.doc format Office 2003). All I need to do is read, not write. Thanks, Mark ------------------------------------------------------------ Mark
2009 Dec 22
0
Reading PDF files (using xpdf)
...+'. (7) Naviagate to the folder which contains the files: C:/../xpdf-3.02pl4-win32 (8) Add it and click Ok. Then you can can do something like: > library(tm) > my.path <- 'C:\\Documents and Settings\\tony\\Desktop\\pdfs\\' #put your pdfs in here > Corpus(DirSource(my.path), readerControl = list(reader=readPDF)) There are some limitations to how well the conversions work depending on the pdf file, but it was so long ago now that I'm afraid I don't remember the details. HTH. Tony Breyal 2009/12/22 <zeusufza at lmu.edu>: > Hi: > > I am very new to R. I ju...
2010 Feb 16
0
tm package
Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumbers) reuters21578.dtm...