thr3ads.net - search: "dirsource"

Displaying 20 results from an estimated 35 matches for "dirsource".

2009 Oct 02

text mining

...o load my own document let alone an entire corpus. I have searched the above documenet as well as related documentation. Any leads or help would be appreciated. Thanks everyone from document txt <- system.file("texts", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE))) my attempt txt <- system.file("Speeches/speech", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la&quot...

Library (tm) Error: could not find function "TermDocMatrix".

2010 Apr 23

Library (tm) Error: could not find function "TermDocMatrix".

Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain)))> > > summary(r) A corpus with 20 text documents The metadata consists of 2...

Extracting information from text data

2011 Jan 24

Extracting information from text data

...I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm) A. Using package tm I am using package tm to do the job. I have provided the code below: > my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) In readLines(y, encoding = x$Encoding) : incomplete final line found on 'M:\textmine/slr.txt' > x <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" B. Using package(s) other than tm Once...

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

2009 Jan 15

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

...emory problem. What is a the best way to solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus) > findAssocs(tdm, &quo...

DocumentTermMatrix error

2011 May 21

DocumentTermMatrix error

...input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs' I tried doing this as it is showed in : http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining), with this R code : setwd("C:/Users/mpavlic/Desktop/temp") tekst <- Corpus(DirSource(".")) >Warning message: >In readLines(y, encoding = x$Encoding) : >incomplete final line found on './test.txt' meta(tekst, "Heading", "local") <- c("test") meta(tekst[[1]]) >Available meta data pairs are: Author :...

text mining

2011 May 26

text mining

Hi, how can I import a document whose type is. "txt" using the package tm? it is the command to know that my document is not placed in the library package tm. thanks. -- View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html Sent from the R help mailing list archive at Nabble.com.

package tm: reading XML files

2012 May 29

package tm: reading XML files

...reuters21578 corpus in XML format however (or a self-created subset thereof) the meta data is lost, and the files are interpreted as plain text. I use the following command, where the indicated directory contains all reuters 21578 documents as separate XML files: > reuters21578 <- Corpus(DirSource("C:/Data/Reuters/preprocessed"), readerContol=list(reader=readReut21578XML)) I'm running R2.15.0 under Windows XP. Has anybody else encountered this problem and found a cause/solution. Best regards, -Ad Feelders

Error when attempting to see "Corpus" metadata

2017 Nov 07

Error when attempting to see "Corpus" metadata

...wo variables (nb_pos and nb_neg) in the following line nb_all <- c(nb_pos,nb_neg,recursive=TRUE) # anyone see anything wrong with this line of code -------------------------------------------------- > library("tm") Loading required package: NLP > nb_pos <- Corpus(DirSource(path_to_pos_folder), readerControl = list(language="en")) # appears to be correct > nb_neg <- Corpus(DirSource(path_to_neg_folder), readerControl = list(language="en")) # appears to be correct > nb_all <- c(nb_pos,nb_neg,recursive=TRUE) > > meta(nb_...

tm: Why does adding local metadata take so long?

2009 Oct 13

tm: Why does adding local metadata take so long?

...to a character vector dirName <- "/Volumes/RDR Test Documents/3Compounds/TXT" # Put the paths of the .txt files in the directory into a vector Files_3compounds <- dir(dirName, full.names = TRUE, pattern = "_.*\\.txt", ignore.case = TRUE) # Use that vector to create a DirSource object Dir_3compounds <- DirSource(dirName, pattern = "_.*\\.txt", ignore.case = TRUE, encoding = "latin1") # Read the .txt files into a volatile corpus Corpus_3compounds <- Corpus(Dir_3compounds, readerControl = list(reader = readPlain, language = "en",...

Help needed for Loading "tm" package

2009 Jan 10

Help needed for Loading "tm" package

...in .jinit(system.file("jar", c("weka.jar", "RWeka.jar"), package = pkgname, : Cannot create Java virtual machine (-1) Error : .onLoad failed in 'loadNamespace' for 'RWeka' Error: package 'RWeka' could not be loaded > my.corpurs <-Corpus(DirSource(my.path), readerControl = list(reader=readPlain)) Error: could not find function "Corpus" > my.tdm <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" > my.tdm[1,] Error: object "my.tdm" not found -- Kum-Hoe Hwang, Ph.D. Phone : 82-3...

Problems with rJava and tm packages

2009 Oct 15

Problems with rJava and tm packages

...39; was built under R version 2.9.1 Error : .onLoad failed in 'loadNamespace' for 'rJava' Error: package/namespace load failed for 'rJava' > > #Set documents directory > DIR <- "G:/TextSearch/Speeches" > > #Load corpus > speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain, + language = "en_US", load = TRUE)) > > #Remove stopwords > speech <- tmMap(speech, stripWhitespace) > speech A corpus with 2 text documents > tdm<-TermDocumentMatrix(speech) Error in if (!nchar(javahome)) stop("J...

Help with tm assocation analysis and Rgraphviz installation.

2009 Mar 30

Help with tm assocation analysis and Rgraphviz installation.

...ed to this word, and I got tons of association ‘1’ . I tried other terms, and no association value is less than 1, which obviously is wrong. Could any export tell me where did I do wrong? My R-code is: R>my.path<-'C:\\textfile' R>library(tm) R>my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) R>tdmO <- TermDocMatrix(my.corpus) R>tdmO An object of class “TermDocMatrix” Slot "Data": 2 x 1426 sparse Matrix of class "dgCMatrix" [[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]] 1 3 1 1...

wordcloud y tabla de palabras

2014 Jul 25

wordcloud y tabla de palabras

...lo siguiente: ########## >informes<-c("2013", "2005") >pathname<-"C:/Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/" >TDM<-function(informes, pathname) { info.dir<-sprintf("%s/%s", pathname, informes) info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8")) info.cor.cl<-tm_map(info.cor, content_transformer(tolower)) info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) info.cor.cl<-tm_map(info.cor.cl,removePunctuation) sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding=...

Invalid input error in tm package

2010 Jan 22

Invalid input error in tm package

Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold) *surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf' 2: In readLines(y, encoding = x$Encoding) : incomplete final line found on ...

Stemming functions only work on the last word of plain text documents

2011 Sep 05

Stemming functions only work on the last word of plain text documents

...apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example: > path <- c("c:\path\to\directory") # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data frame are: MetaID...

[R} how to build TermDocMatrix in tm text mining package of R

2009 Jan 09

[R} how to build TermDocMatrix in tm text mining package of R

Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.

question about the Text Mining package tm

2009 Apr 17

question about the Text Mining package tm

...ually there were no new lines in the original file but I inserted a new line before every occurrence of http. I ran the following code: library("tm") my.path <- 'C:\\dataForR\\textsTweet1\\' my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv' (ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain, language = "la"))) Response from R: A text document collection with 3 text documents Warning message: In readLines(filename, encoding = encoding) : incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt' Then I ran...

reading in MS Word files

2009 Aug 17

reading in MS Word files

I am familiar with packages that read and write Excel files on both Windows and Linux platforms. Do any packages provide similar functionality for MS Word files? I have a lot of text processing to do and the text is embedded in ~200 different Word files (.doc format Office 2003). All I need to do is read, not write. Thanks, Mark ------------------------------------------------------------ Mark

Reading PDF files (using xpdf)

2009 Dec 22

Reading PDF files (using xpdf)

...ft hand corner '+'. (7) Naviagate to the folder which contains the files: C:/../xpdf-3.02pl4-win32 (8) Add it and click Ok. Then you can can do something like: > library(tm) > my.path <- 'C:\\Documents and Settings\\tony\\Desktop\\pdfs\\' #put your pdfs in here > Corpus(DirSource(my.path), readerControl = list(reader=readPDF)) There are some limitations to how well the conversions work depending on the pdf file, but it was so long ago now that I'm afraid I don't remember the details. HTH. Tony Breyal 2009/12/22 <zeusufza at lmu.edu>: > Hi: > >...

tm package

2010 Feb 16

tm package

Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumb...

search for: dirsource