thr3ads.net - search: "readercontrol"

Displaying 20 results from an estimated 33 matches for "readercontrol".

2009 Oct 02

text mining

...ument let alone an entire corpus. I have searched the above documenet as well as related documentation. Any leads or help would be appreciated. Thanks everyone from document txt <- system.file("texts", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE))) my attempt txt <- system.file("Speeches/speech", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE)))...

Library (tm) Error: could not find function "TermDocMatrix".

2010 Apr 23

Library (tm) Error: could not find function "TermDocMatrix".

Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain)))> > > summary(r) A corpus with 20 text documents The metadata consists of 2 tag-value pairs and a data...

Extracting information from text data

2011 Jan 24

Extracting information from text data

...ce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm) A. Using package tm I am using package tm to do the job. I have provided the code below: > my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) In readLines(y, encoding = x$Encoding) : incomplete final line found on 'M:\textmine/slr.txt' > x <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" B. Using package(s) other than tm Once again, thank you very mu...

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

2009 Jan 15

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

...o solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus) > findAssocs(tdm, "city", 0.97) error:cannot allocate ve...

text mining

2011 May 26

text mining

Hi, how can I import a document whose type is. "txt" using the package tm? it is the command to know that my document is not placed in the library package tm. thanks. -- View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html Sent from the R help mailing list archive at Nabble.com.

Error when attempting to see "Corpus" metadata

2017 Nov 07

Error when attempting to see "Corpus" metadata

...) in the following line nb_all <- c(nb_pos,nb_neg,recursive=TRUE) # anyone see anything wrong with this line of code -------------------------------------------------- > library("tm") Loading required package: NLP > nb_pos <- Corpus(DirSource(path_to_pos_folder), readerControl = list(language="en")) # appears to be correct > nb_neg <- Corpus(DirSource(path_to_neg_folder), readerControl = list(language="en")) # appears to be correct > nb_all <- c(nb_pos,nb_neg,recursive=TRUE) > > meta(nb_all[[1]]) Error in UseMethod(&quot...

Can't pass file name as parameter to Corpus function

2009 Nov 03

Can't pass file name as parameter to Corpus function

...code below interactively in R. But calling the function with a file name as the parameter I got the error message saying "Error in eval(expr, envir, enclos) : object 'strFileName' not found" test<-function(strFileName) { src <- URISource(strFileName) cor <- Corpus(src, readerControl = list(reader = readPDF, language = "en_US", load = TRUE)) } After running the following code in R I checked the docURISource$URI and the value is "strFileName" rather than "C:\\Temp\\readme.txt". I also checked the URI when I was debugging the function and the URI is...

Help needed for Loading "tm" package

2009 Jan 10

Help needed for Loading "tm" package

...e("jar", c("weka.jar", "RWeka.jar"), package = pkgname, : Cannot create Java virtual machine (-1) Error : .onLoad failed in 'loadNamespace' for 'RWeka' Error: package 'RWeka' could not be loaded > my.corpurs <-Corpus(DirSource(my.path), readerControl = list(reader=readPlain)) Error: could not find function "Corpus" > my.tdm <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" > my.tdm[1,] Error: object "my.tdm" not found -- Kum-Hoe Hwang, Ph.D. Phone : 82-31-250-3516 Email : phdhw...

Problems with rJava and tm packages

2009 Oct 15

Problems with rJava and tm packages

...der R version 2.9.1 Error : .onLoad failed in 'loadNamespace' for 'rJava' Error: package/namespace load failed for 'rJava' > > #Set documents directory > DIR <- "G:/TextSearch/Speeches" > > #Load corpus > speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain, + language = "en_US", load = TRUE)) > > #Remove stopwords > speech <- tmMap(speech, stripWhitespace) > speech A corpus with 2 text documents > tdm<-TermDocumentMatrix(speech) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set...

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

...orpus based on the contents of just one variable, and I construct the corpus from a VectorSource. When I run inspect, all seems fine and I can see the text properly, with unicode characters present: data.corpus<-Corpus(VectorSource(data$variable,encoding='UTF-8'), readerControl=list(language='bulgarian')) However, no matter what I do - like which encoding I select - UTF-8 or CP1251, which is the typical code page for Bulgarian texts, I cannot get to remove the stop words from my corpus. The issue is present in both Linux and Windows, and across the computers I...

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

2013 Apr 09

Question on Stopword Removal from a Cyrillic (Bulgarian)Text

Help with tm assocation analysis and Rgraphviz installation.

2009 Mar 30

Help with tm assocation analysis and Rgraphviz installation.

...I got tons of association ‘1’ . I tried other terms, and no association value is less than 1, which obviously is wrong. Could any export tell me where did I do wrong? My R-code is: R>my.path<-'C:\\textfile' R>library(tm) R>my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) R>tdmO <- TermDocMatrix(my.corpus) R>tdmO An object of class “TermDocMatrix” Slot "Data": 2 x 1426 sparse Matrix of class "dgCMatrix" [[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]] 1 3 1 12 1 1 1 8 1 1 2 1 9 . 2...

Invalid input error in tm package

2010 Jan 22

Invalid input error in tm package

Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold) *surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf' 2: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'D:/Files/provmedsurgj00978-0007.pdf...

Curious treatment of entities in xmlTreeParse

2011 Apr 06

Curious treatment of entities in xmlTreeParse

...Document()) rss2Source <- function(x, encoding = "UTF-8") XMLSource(x, function(tree) XML::getNodeSet(XML::xmlRoot(tree),"/rss/channel/item"), rss2Reader, encoding) feed.rss2 <- rss2Source(url("http://scottbw.wordpress.com/feed/")) corp1<-Corpus(feed.rss2, readerControl=list(language="en")) I've googled around for this problem but got nowhere. Have I missed something? Any help will be received gratefully; this was supposed to be the easy part! Cheers, Adam

Stemming functions only work on the last word of plain text documents

2011 Sep 05

Stemming functions only work on the last word of plain text documents

...orpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example: > path <- c("c:\path\to\directory") # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data frame are: MetaID $`1.txt` running runs...

[R} how to build TermDocMatrix in tm text mining package of R

2009 Jan 09

[R} how to build TermDocMatrix in tm text mining package of R

Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.

question about the Text Mining package tm

2009 Apr 17

question about the Text Mining package tm

...new lines in the original file but I inserted a new line before every occurrence of http. I ran the following code: library("tm") my.path <- 'C:\\dataForR\\textsTweet1\\' my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv' (ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain, language = "la"))) Response from R: A text document collection with 3 text documents Warning message: In readLines(filename, encoding = encoding) : incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt' Then I ran the TermDocMatrix funct...

reading in MS Word files

2009 Aug 17

reading in MS Word files

I am familiar with packages that read and write Excel files on both Windows and Linux platforms. Do any packages provide similar functionality for MS Word files? I have a lot of text processing to do and the text is embedded in ~200 different Word files (.doc format Office 2003). All I need to do is read, not write. Thanks, Mark ------------------------------------------------------------ Mark

Reading PDF files (using xpdf)

2009 Dec 22

Reading PDF files (using xpdf)

...+'. (7) Naviagate to the folder which contains the files: C:/../xpdf-3.02pl4-win32 (8) Add it and click Ok. Then you can can do something like: > library(tm) > my.path <- 'C:\\Documents and Settings\\tony\\Desktop\\pdfs\\' #put your pdfs in here > Corpus(DirSource(my.path), readerControl = list(reader=readPDF)) There are some limitations to how well the conversions work depending on the pdf file, but it was so long ago now that I'm afraid I don't remember the details. HTH. Tony Breyal 2009/12/22 <zeusufza at lmu.edu>: > Hi: > > I am very new to R. I ju...

tm package

2010 Feb 16

tm package

Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumbers) reuters21578.dtm...

search for: readercontrol