Displaying 20 results from an estimated 35 matches for "dirsource".
2009 Oct 02
1
text mining
...o load my own document let alone an
entire corpus. I have searched the above documenet as well as related
documentation. Any leads or help would be appreciated. Thanks everyone
from document
txt <- system.file("texts", "txt", package = "tm")
(ovid <- Corpus(DirSource(txt),
readerControl = list(reader = readPlain,
language = "la",
load = TRUE)))
my attempt
txt <- system.file("Speeches/speech", "txt", package = "tm")
(ovid <- Corpus(DirSource(txt),
readerControl = list(reader = readPlain,
language = "la"...
2010 Apr 23
2
Library (tm) Error: could not find function "TermDocMatrix".
Hi List
I have the next code and the error. I have try with other codes and I have
the same problem.
> reut21578 <- system.file("texts", "crude", package = "tm")
> (r <- Corpus(DirSource(reut21578), readerControl = list(reader =
> readReut21578XMLasPlain)))
A corpus with 20 text documents
> (r <- Corpus(DirSource(reut21578), readerControl = list(reader =
> readReut21578XMLasPlain)))> >
> summary(r)
A corpus with 20 text documents
The metadata consists of 2...
2011 Jan 24
1
Extracting information from text data
...I am trying to produce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm)
A. Using package tm
I am using package tm to do the job. I have provided the code below:
> my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain))
In readLines(y, encoding = x$Encoding) :
incomplete final line found on 'M:\textmine/slr.txt'
> x <- TermDocMatrix(my.corpus)
Error: could not find function "TermDocMatrix"
B. Using package(s) other than tm
Once...
2009 Jan 15
1
How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
...emory problem.
What is a the best way to solve this memory problem among increasing a
physical RAM, or doing other recipes, etc?
###############################
###### my R Script's Outputs ######
###############################
> memory.limit(size = 2000)
NULL
> corpus.ko <- Corpus(DirSource("test_konews/"),
+ readerControl = list(reader = readPlain,
+ language = "UTF-8", load = FALSE))
> corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace)
> corpus <- tmMap(corpus.ko.nowhite, tmTolower)
> tdm <- TermDocMatrix(corpus)
> findAssocs(tdm, &quo...
2011 May 21
1
DocumentTermMatrix error
...input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
I tried doing this as it is showed in :
http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining),
with this R code :
setwd("C:/Users/mpavlic/Desktop/temp")
tekst <- Corpus(DirSource("."))
>Warning message:
>In readLines(y, encoding = x$Encoding) :
>incomplete final line found on './test.txt'
meta(tekst, "Heading", "local") <- c("test")
meta(tekst[[1]])
>Available meta data pairs are:
Author :...
2011 May 26
3
text mining
Hi,
how can I import a document whose type is. "txt" using the package tm?
it is the command to know that my document is not placed in the library
package tm.
thanks.
--
View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html
Sent from the R help mailing list archive at Nabble.com.
2012 May 29
1
package tm: reading XML files
...reuters21578 corpus in XML format however (or
a self-created subset thereof) the meta data is lost, and the files are
interpreted as plain text.
I use the following command, where the indicated directory contains all
reuters 21578 documents as separate XML files:
> reuters21578 <- Corpus(DirSource("C:/Data/Reuters/preprocessed"),
readerContol=list(reader=readReut21578XML))
I'm running R2.15.0 under Windows XP.
Has anybody else encountered this problem and found a cause/solution.
Best regards,
-Ad Feelders
2017 Nov 07
0
Error when attempting to see "Corpus" metadata
...wo variables (nb_pos and
nb_neg) in the following line
nb_all <- c(nb_pos,nb_neg,recursive=TRUE) # anyone see anything wrong with
this line of code
--------------------------------------------------
> library("tm")
Loading required package: NLP
> nb_pos <- Corpus(DirSource(path_to_pos_folder), readerControl =
list(language="en")) # appears to be correct
> nb_neg <- Corpus(DirSource(path_to_neg_folder), readerControl =
list(language="en")) # appears to be correct
> nb_all <- c(nb_pos,nb_neg,recursive=TRUE)
>
> meta(nb_...
2009 Oct 13
0
tm: Why does adding local metadata take so long?
...to a character vector
dirName <- "/Volumes/RDR Test Documents/3Compounds/TXT"
# Put the paths of the .txt files in the directory into a vector
Files_3compounds <- dir(dirName,
full.names = TRUE,
pattern = "_.*\\.txt",
ignore.case = TRUE)
# Use that vector to create a DirSource object
Dir_3compounds <- DirSource(dirName,
pattern = "_.*\\.txt",
ignore.case = TRUE,
encoding = "latin1")
# Read the .txt files into a volatile corpus
Corpus_3compounds <- Corpus(Dir_3compounds,
readerControl = list(reader = readPlain,
language = "en",...
2009 Jan 10
1
Help needed for Loading "tm" package
...in .jinit(system.file("jar", c("weka.jar", "RWeka.jar"), package =
pkgname, :
Cannot create Java virtual machine (-1)
Error : .onLoad failed in 'loadNamespace' for 'RWeka'
Error: package 'RWeka' could not be loaded
> my.corpurs <-Corpus(DirSource(my.path), readerControl =
list(reader=readPlain))
Error: could not find function "Corpus"
> my.tdm <- TermDocMatrix(my.corpus)
Error: could not find function "TermDocMatrix"
> my.tdm[1,]
Error: object "my.tdm" not found
--
Kum-Hoe Hwang, Ph.D.
Phone : 82-3...
2009 Oct 15
1
Problems with rJava and tm packages
...39; was built under R version 2.9.1
Error : .onLoad failed in 'loadNamespace' for 'rJava'
Error: package/namespace load failed for 'rJava'
>
> #Set documents directory
> DIR <- "G:/TextSearch/Speeches"
>
> #Load corpus
> speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain,
+ language = "en_US", load = TRUE))
>
> #Remove stopwords
> speech <- tmMap(speech, stripWhitespace)
> speech
A corpus with 2 text documents
> tdm<-TermDocumentMatrix(speech)
Error in if (!nchar(javahome)) stop("J...
2009 Mar 30
1
Help with tm assocation analysis and Rgraphviz installation.
...ed to this word, and I got tons of association ‘1’ .
I tried other terms, and no association value is less than 1, which
obviously is wrong.
Could any export tell me where did I do wrong?
My R-code is:
R>my.path<-'C:\\textfile'
R>library(tm)
R>my.corpus <- Corpus(DirSource(my.path), readerControl = list
(reader=readPlain))
R>tdmO <- TermDocMatrix(my.corpus)
R>tdmO
An object of class “TermDocMatrix”
Slot "Data":
2 x 1426 sparse Matrix of class "dgCMatrix"
[[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]]
1 3 1 1...
2014 Jul 25
3
wordcloud y tabla de palabras
...lo siguiente:
##########
>informes<-c("2013", "2005")
>pathname<-"C:/Users/d_2/Documents/Comision/PLAN de INSPECCIONES/Informes/"
>TDM<-function(informes, pathname) {
info.dir<-sprintf("%s/%s", pathname, informes)
info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8"))
info.cor.cl<-tm_map(info.cor, content_transformer(tolower))
info.cor.cl<-tm_map(info.cor.cl, stripWhitespace)
info.cor.cl<-tm_map(info.cor.cl,removePunctuation)
sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding=...
2010 Jan 22
1
Invalid input error in tm package
Hello,
I am working on "tm" package.
I have 2 pdf files saved in the directory D:/Files
I issued the following commands (marked in red bold) for which I got some
errors and warnings (marked in bold)
*surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language =
"ansi"))*
*Warning messages:
1: In readLines(y, encoding = x$Encoding) :
incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf'
2: In readLines(y, encoding = x$Encoding) :
incomplete final line found on ...
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
...apply it to my corpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example:
> path <- c("c:\path\to\directory") # collection of plain text documents
> corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T))
> inspect(corp)
A corpus with 2 text documents
The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
create_date creator
Available variables in the data frame are:
MetaID...
2009 Jan 09
1
[R} how to build TermDocMatrix in tm text mining package of R
Howdy Gurus
I 'd like to ask a question about how to build TermDocMatrix in tm text
mining package.
It is not clear about importing a plain text file, and them converting that
text file into TermDocMatrix file, etc to me.
How can I build a TermDocMatrix of " a plain text document file for text
association?
Or are there any good manuals?
Thank you in advance,
--
Kum-Hoe Hwang, Ph.D.
2009 Apr 17
0
question about the Text Mining package tm
...ually there were no new lines in the original file but I inserted a new
line before every occurrence of http.
I ran the following code:
library("tm")
my.path <- 'C:\\dataForR\\textsTweet1\\'
my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv'
(ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain,
language = "la")))
Response from R:
A text document collection with 3 text documents
Warning message:
In readLines(filename, encoding = encoding) :
incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt'
Then I ran...
2009 Aug 17
2
reading in MS Word files
I am familiar with packages that read and write Excel files on both Windows
and Linux platforms.
Do any packages provide similar functionality for MS Word files? I have a
lot of text processing to do and the text is embedded in ~200 different Word
files (.doc format Office 2003). All I need to do is read, not write.
Thanks,
Mark
------------------------------------------------------------
Mark
2009 Dec 22
0
Reading PDF files (using xpdf)
...ft hand corner '+'.
(7) Naviagate to the folder which contains the files: C:/../xpdf-3.02pl4-win32
(8) Add it and click Ok.
Then you can can do something like:
> library(tm)
> my.path <- 'C:\\Documents and Settings\\tony\\Desktop\\pdfs\\' #put your pdfs in here
> Corpus(DirSource(my.path), readerControl = list(reader=readPDF))
There are some limitations to how well the conversions work depending
on the pdf file, but it was so long ago now that I'm afraid I don't
remember the details.
HTH.
Tony Breyal
2009/12/22 <zeusufza at lmu.edu>:
> Hi:
>
>...
2010 Feb 16
0
tm package
Hi,
I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me
as if after the following
reuters21578 <- Corpus(DirSource(corpusDir), readerControl =
list(reader = readReut21578XMLasPlain))
reuters21578 <- tm_map(reuters21578, stripWhitespace)
reuters21578 <- tm_map(reuters21578, tolower)
reuters21578 <- tm_map(reuters21578, removePunctuation)
reuters21578 <- tm_map(reuters21578, removeNumb...