search for: dirsource

Displaying 20 results from an estimated 35 matches for "dirsource".

2009 Oct 02
1
text mining
...o load my own document let alone an entire corpus. I have searched the above documenet as well as related documentation. Any leads or help would be appreciated. Thanks everyone from document txt <- system.file("texts", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE))) my attempt txt <- system.file("Speeches/speech", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la&quot...
2010 Apr 23
2
Library (tm) Error: could not find function "TermDocMatrix".
Hi List I have the next code and the error. I have try with other codes and I have the same problem. > reut21578 <- system.file("texts", "crude", package = "tm") > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain))) A corpus with 20 text documents > (r <- Corpus(DirSource(reut21578), readerControl = list(reader = > readReut21578XMLasPlain)))> > > summary(r) A corpus with 20 text documents The metadata consists of 2...
2011 Jan 24
1
Extracting information from text data
...I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm)   A. Using package tm   I am using package tm to do the job. I have provided the code below:   > my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain))   In readLines(y, encoding = x$Encoding) :   incomplete final line found on 'M:\textmine/slr.txt'   > x <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix"   B. Using package(s) other than tm    Once...
2009 Jan 15
1
How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
...emory problem. What is a the best way to solve this memory problem among increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus) > findAssocs(tdm, &quo...
2011 May 21
1
DocumentTermMatrix error
...input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs' I tried doing this as it is showed in : http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining), with this R code : setwd("C:/Users/mpavlic/Desktop/temp") tekst <- Corpus(DirSource(".")) >Warning message: >In readLines(y, encoding = x$Encoding) : >incomplete final line found on './test.txt' meta(tekst, "Heading", "local") <- c("test") meta(tekst[[1]]) >Available meta data pairs are: Author :...
2011 May 26
3
text mining
Hi, how can I import a document whose type is. "txt" using the package tm? it is the command to know that my document is not placed in the library package tm. thanks. -- View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html Sent from the R help mailing list archive at Nabble.com.
2012 May 29
1
package tm: reading XML files
...reuters21578 corpus in XML format however (or a self-created subset thereof) the meta data is lost, and the files are interpreted as plain text. I use the following command, where the indicated directory contains all reuters 21578 documents as separate XML files: > reuters21578 <- Corpus(DirSource("C:/Data/Reuters/preprocessed"), readerContol=list(reader=readReut21578XML)) I'm running R2.15.0 under Windows XP. Has anybody else encountered this problem and found a cause/solution. Best regards, -Ad Feelders
2017 Nov 07
0
Error when attempting to see "Corpus" metadata
...wo variables (nb_pos and nb_neg) in the following line nb_all <- c(nb_pos,nb_neg,recursive=TRUE) # anyone see anything wrong with this line of code -------------------------------------------------- > library("tm") Loading required package: NLP > nb_pos <- Corpus(DirSource(path_to_pos_folder), readerControl = list(language="en")) # appears to be correct > nb_neg <- Corpus(DirSource(path_to_neg_folder), readerControl = list(language="en")) # appears to be correct > nb_all <- c(nb_pos,nb_neg,recursive=TRUE) > > meta(nb_...
2009 Oct 13
0
tm: Why does adding local metadata take so long?
...to a character vector dirName <- "/Volumes/RDR Test Documents/3Compounds/TXT" # Put the paths of the .txt files in the directory into a vector Files_3compounds <- dir(dirName, full.names = TRUE, pattern = "_.*\\.txt", ignore.case = TRUE) # Use that vector to create a DirSource object Dir_3compounds <- DirSource(dirName, pattern = "_.*\\.txt", ignore.case = TRUE, encoding = "latin1") # Read the .txt files into a volatile corpus Corpus_3compounds <- Corpus(Dir_3compounds, readerControl = list(reader = readPlain, language = "en",...
2009 Jan 10
1
Help needed for Loading "tm" package
...in .jinit(system.file("jar", c("weka.jar", "RWeka.jar"), package = pkgname, : Cannot create Java virtual machine (-1) Error : .onLoad failed in 'loadNamespace' for 'RWeka' Error: package 'RWeka' could not be loaded > my.corpurs <-Corpus(DirSource(my.path), readerControl = list(reader=readPlain)) Error: could not find function "Corpus" > my.tdm <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" > my.tdm[1,] Error: object "my.tdm" not found -- Kum-Hoe Hwang, Ph.D. Phone : 82-3...
2009 Oct 15
1
Problems with rJava and tm packages
...39; was built under R version 2.9.1 Error : .onLoad failed in 'loadNamespace' for 'rJava' Error: package/namespace load failed for 'rJava' > > #Set documents directory > DIR <- "G:/TextSearch/Speeches" > > #Load corpus > speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain, + language = "en_US", load = TRUE)) > > #Remove stopwords > speech <- tmMap(speech, stripWhitespace) > speech A corpus with 2 text documents > tdm<-TermDocumentMatrix(speech) Error in if (!nchar(javahome)) stop("J...
2009 Mar 30
1
Help with tm assocation analysis and Rgraphviz installation.
...ed to this word, and I got tons of association ‘1’ . I tried other terms, and no association value is less than 1, which obviously is wrong. Could any export tell me where did I do wrong? My R-code is: R>my.path<-'C:\\textfile' R>library(tm) R>my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) R>tdmO <- TermDocMatrix(my.corpus) R>tdmO An object of class “TermDocMatrix” Slot "Data": 2 x 1426 sparse Matrix of class "dgCMatrix" [[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]] 1 3 1 1...
2014 Jul 25
3
wordcloud y tabla de palabras
...lo siguiente: ########## >informes<-c("2013", "2005") >pathname<-"C:/Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/" >TDM<-function(informes, pathname) { info.dir<-sprintf("%s/%s", pathname, informes) info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8")) info.cor.cl<-tm_map(info.cor, content_transformer(tolower)) info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) info.cor.cl<-tm_map(info.cor.cl,removePunctuation) sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding=...
2010 Jan 22
1
Invalid input error in tm package
Hello, I am working on "tm" package. I have 2 pdf files saved in the directory D:/Files I issued the following commands (marked in red bold) for which I got some errors and warnings (marked in bold) *surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language = "ansi"))* *Warning messages: 1: In readLines(y, encoding = x$Encoding) : incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf' 2: In readLines(y, encoding = x$Encoding) : incomplete final line found on ...
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
...apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all).  An example: > path <- c("c:\path\to\directory")       # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are:   create_date creator Available variables in the data frame are:   MetaID...
2009 Jan 09
1
[R} how to build TermDocMatrix in tm text mining package of R
Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.
2009 Apr 17
0
question about the Text Mining package tm
...ually there were no new lines in the original file but I inserted a new line before every occurrence of http. I ran the following code: library("tm") my.path <- 'C:\\dataForR\\textsTweet1\\' my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv' (ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain, language = "la"))) Response from R: A text document collection with 3 text documents Warning message: In readLines(filename, encoding = encoding) : incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt' Then I ran...
2009 Aug 17
2
reading in MS Word files
I am familiar with packages that read and write Excel files on both Windows and Linux platforms. Do any packages provide similar functionality for MS Word files? I have a lot of text processing to do and the text is embedded in ~200 different Word files (.doc format Office 2003). All I need to do is read, not write. Thanks, Mark ------------------------------------------------------------ Mark
2009 Dec 22
0
Reading PDF files (using xpdf)
...ft hand corner '+'. (7) Naviagate to the folder which contains the files: C:/../xpdf-3.02pl4-win32 (8) Add it and click Ok. Then you can can do something like: > library(tm) > my.path <- 'C:\\Documents and Settings\\tony\\Desktop\\pdfs\\' #put your pdfs in here > Corpus(DirSource(my.path), readerControl = list(reader=readPDF)) There are some limitations to how well the conversions work depending on the pdf file, but it was so long ago now that I'm afraid I don't remember the details. HTH. Tony Breyal 2009/12/22 <zeusufza at lmu.edu>: > Hi: > >...
2010 Feb 16
0
tm package
Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumb...