Displaying 20 results from an estimated 33 matches for "readercontrol".
2009 Oct 02
1
text mining
...ument let alone an
entire corpus. I have searched the above documenet as well as related
documentation. Any leads or help would be appreciated. Thanks everyone
from document
txt <- system.file("texts", "txt", package = "tm")
(ovid <- Corpus(DirSource(txt),
readerControl = list(reader = readPlain,
language = "la",
load = TRUE)))
my attempt
txt <- system.file("Speeches/speech", "txt", package = "tm")
(ovid <- Corpus(DirSource(txt),
readerControl = list(reader = readPlain,
language = "la",
load = TRUE)))...
2010 Apr 23
2
Library (tm) Error: could not find function "TermDocMatrix".
Hi List
I have the next code and the error. I have try with other codes and I have
the same problem.
> reut21578 <- system.file("texts", "crude", package = "tm")
> (r <- Corpus(DirSource(reut21578), readerControl = list(reader =
> readReut21578XMLasPlain)))
A corpus with 20 text documents
> (r <- Corpus(DirSource(reut21578), readerControl = list(reader =
> readReut21578XMLasPlain)))> >
> summary(r)
A corpus with 20 text documents
The metadata consists of 2 tag-value pairs and a data...
2011 Jan 24
1
Extracting information from text data
...ce an n X m matrix from text data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm)
A. Using package tm
I am using package tm to do the job. I have provided the code below:
> my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain))
In readLines(y, encoding = x$Encoding) :
incomplete final line found on 'M:\textmine/slr.txt'
> x <- TermDocMatrix(my.corpus)
Error: could not find function "TermDocMatrix"
B. Using package(s) other than tm
Once again, thank you very mu...
2009 Jan 15
1
How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
...o solve this memory problem among increasing a
physical RAM, or doing other recipes, etc?
###############################
###### my R Script's Outputs ######
###############################
> memory.limit(size = 2000)
NULL
> corpus.ko <- Corpus(DirSource("test_konews/"),
+ readerControl = list(reader = readPlain,
+ language = "UTF-8", load = FALSE))
> corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace)
> corpus <- tmMap(corpus.ko.nowhite, tmTolower)
> tdm <- TermDocMatrix(corpus)
> findAssocs(tdm, "city", 0.97)
error:cannot allocate ve...
2011 May 26
3
text mining
Hi,
how can I import a document whose type is. "txt" using the package tm?
it is the command to know that my document is not placed in the library
package tm.
thanks.
--
View this message in context: http://r.789695.n4.nabble.com/text-mining-tp3552221p3552221.html
Sent from the R help mailing list archive at Nabble.com.
2017 Nov 07
0
Error when attempting to see "Corpus" metadata
...) in the following line
nb_all <- c(nb_pos,nb_neg,recursive=TRUE) # anyone see anything wrong with
this line of code
--------------------------------------------------
> library("tm")
Loading required package: NLP
> nb_pos <- Corpus(DirSource(path_to_pos_folder), readerControl =
list(language="en")) # appears to be correct
> nb_neg <- Corpus(DirSource(path_to_neg_folder), readerControl =
list(language="en")) # appears to be correct
> nb_all <- c(nb_pos,nb_neg,recursive=TRUE)
>
> meta(nb_all[[1]])
Error in UseMethod("...
2009 Nov 03
1
Can't pass file name as parameter to Corpus function
...code below interactively in
R. But calling the function with a file name as the parameter I got the
error message saying "Error in eval(expr, envir, enclos) : object
'strFileName' not found"
test<-function(strFileName) {
src <- URISource(strFileName)
cor <- Corpus(src, readerControl = list(reader = readPDF, language =
"en_US", load = TRUE))
}
After running the following code in R I checked the docURISource$URI and the
value is "strFileName" rather than "C:\\Temp\\readme.txt". I also checked
the URI when I was debugging the function and the URI is...
2009 Jan 10
1
Help needed for Loading "tm" package
...e("jar", c("weka.jar", "RWeka.jar"), package =
pkgname, :
Cannot create Java virtual machine (-1)
Error : .onLoad failed in 'loadNamespace' for 'RWeka'
Error: package 'RWeka' could not be loaded
> my.corpurs <-Corpus(DirSource(my.path), readerControl =
list(reader=readPlain))
Error: could not find function "Corpus"
> my.tdm <- TermDocMatrix(my.corpus)
Error: could not find function "TermDocMatrix"
> my.tdm[1,]
Error: object "my.tdm" not found
--
Kum-Hoe Hwang, Ph.D.
Phone : 82-31-250-3516
Email : phdhw...
2009 Oct 15
1
Problems with rJava and tm packages
...der R version 2.9.1
Error : .onLoad failed in 'loadNamespace' for 'rJava'
Error: package/namespace load failed for 'rJava'
>
> #Set documents directory
> DIR <- "G:/TextSearch/Speeches"
>
> #Load corpus
> speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain,
+ language = "en_US", load = TRUE))
>
> #Remove stopwords
> speech <- tmMap(speech, stripWhitespace)
> speech
A corpus with 2 text documents
> tdm<-TermDocumentMatrix(speech)
Error in if (!nchar(javahome)) stop("JAVA_HOME is not set...
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
...orpus based on the contents of just one
variable, and I construct the corpus from a VectorSource. When I run
inspect, all seems fine and I can see the text properly, with unicode
characters present:
data.corpus<-Corpus(VectorSource(data$variable,encoding='UTF-8'),
readerControl=list(language='bulgarian'))
However, no matter what I do - like which encoding I select - UTF-8 or
CP1251, which is the typical code page for Bulgarian texts, I cannot get
to remove the stop words from my corpus. The issue is present in both
Linux and Windows, and across the computers I...
2013 Apr 09
3
Question on Stopword Removal from a Cyrillic (Bulgarian)Text
...orpus based on the contents of just one
variable, and I construct the corpus from a VectorSource. When I run
inspect, all seems fine and I can see the text properly, with unicode
characters present:
data.corpus<-Corpus(VectorSource(data$variable,encoding='UTF-8'),
readerControl=list(language='bulgarian'))
However, no matter what I do - like which encoding I select - UTF-8 or
CP1251, which is the typical code page for Bulgarian texts, I cannot get
to remove the stop words from my corpus. The issue is present in both
Linux and Windows, and across the computers I...
2009 Mar 30
1
Help with tm assocation analysis and Rgraphviz installation.
...I got tons of association ‘1’ .
I tried other terms, and no association value is less than 1, which
obviously is wrong.
Could any export tell me where did I do wrong?
My R-code is:
R>my.path<-'C:\\textfile'
R>library(tm)
R>my.corpus <- Corpus(DirSource(my.path), readerControl = list
(reader=readPlain))
R>tdmO <- TermDocMatrix(my.corpus)
R>tdmO
An object of class “TermDocMatrix”
Slot "Data":
2 x 1426 sparse Matrix of class "dgCMatrix"
[[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]]
1 3 1 12 1 1 1 8 1 1 2 1 9 . 2...
2010 Jan 22
1
Invalid input error in tm package
Hello,
I am working on "tm" package.
I have 2 pdf files saved in the directory D:/Files
I issued the following commands (marked in red bold) for which I got some
errors and warnings (marked in bold)
*surgj <- Corpus(DirSource("D:/Files"), readerControl = list(language =
"ansi"))*
*Warning messages:
1: In readLines(y, encoding = x$Encoding) :
incomplete final line found on 'D:/Files/provmedsurgj00978-0005b.pdf'
2: In readLines(y, encoding = x$Encoding) :
incomplete final line found on 'D:/Files/provmedsurgj00978-0007.pdf...
2011 Apr 06
0
Curious treatment of entities in xmlTreeParse
...Document())
rss2Source <- function(x, encoding = "UTF-8")
XMLSource(x, function(tree)
XML::getNodeSet(XML::xmlRoot(tree),"/rss/channel/item"), rss2Reader,
encoding)
feed.rss2 <- rss2Source(url("http://scottbw.wordpress.com/feed/"))
corp1<-Corpus(feed.rss2, readerControl=list(language="en"))
I've googled around for this problem but got nowhere. Have I missed
something?
Any help will be received gratefully; this was supposed to be the easy
part!
Cheers, Adam
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
...orpus using the tm_map function it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example:
> path <- c("c:\path\to\directory") # collection of plain text documents
> corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T))
> inspect(corp)
A corpus with 2 text documents
The metadata consists of 2 tag-value pairs and a data frame
Available tags are:
create_date creator
Available variables in the data frame are:
MetaID
$`1.txt`
running runs...
2009 Jan 09
1
[R} how to build TermDocMatrix in tm text mining package of R
Howdy Gurus
I 'd like to ask a question about how to build TermDocMatrix in tm text
mining package.
It is not clear about importing a plain text file, and them converting that
text file into TermDocMatrix file, etc to me.
How can I build a TermDocMatrix of " a plain text document file for text
association?
Or are there any good manuals?
Thank you in advance,
--
Kum-Hoe Hwang, Ph.D.
2009 Apr 17
0
question about the Text Mining package tm
...new lines in the original file but I inserted a new
line before every occurrence of http.
I ran the following code:
library("tm")
my.path <- 'C:\\dataForR\\textsTweet1\\'
my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv'
(ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain,
language = "la")))
Response from R:
A text document collection with 3 text documents
Warning message:
In readLines(filename, encoding = encoding) :
incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt'
Then I ran the TermDocMatrix funct...
2009 Aug 17
2
reading in MS Word files
I am familiar with packages that read and write Excel files on both Windows
and Linux platforms.
Do any packages provide similar functionality for MS Word files? I have a
lot of text processing to do and the text is embedded in ~200 different Word
files (.doc format Office 2003). All I need to do is read, not write.
Thanks,
Mark
------------------------------------------------------------
Mark
2009 Dec 22
0
Reading PDF files (using xpdf)
...+'.
(7) Naviagate to the folder which contains the files: C:/../xpdf-3.02pl4-win32
(8) Add it and click Ok.
Then you can can do something like:
> library(tm)
> my.path <- 'C:\\Documents and Settings\\tony\\Desktop\\pdfs\\' #put your pdfs in here
> Corpus(DirSource(my.path), readerControl = list(reader=readPDF))
There are some limitations to how well the conversions work depending
on the pdf file, but it was so long ago now that I'm afraid I don't
remember the details.
HTH.
Tony Breyal
2009/12/22 <zeusufza at lmu.edu>:
> Hi:
>
> I am very new to R. I ju...
2010 Feb 16
0
tm package
Hi,
I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me
as if after the following
reuters21578 <- Corpus(DirSource(corpusDir), readerControl =
list(reader = readReut21578XMLasPlain))
reuters21578 <- tm_map(reuters21578, stripWhitespace)
reuters21578 <- tm_map(reuters21578, tolower)
reuters21578 <- tm_map(reuters21578, removePunctuation)
reuters21578 <- tm_map(reuters21578, removeNumbers)
reuters21578.dtm...