search for: readplain

Displaying 20 results from an estimated 21 matches for "readplain".

2009 Oct 02
1
text mining
...us. I have searched the above documenet as well as related documentation. Any leads or help would be appreciated. Thanks everyone from document txt <- system.file("texts", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE))) my attempt txt <- system.file("Speeches/speech", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE))) -- View this message in c...
2011 Jan 24
1
Extracting information from text data
...data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm)   A. Using package tm   I am using package tm to do the job. I have provided the code below:   > my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain))   In readLines(y, encoding = x$Encoding) :   incomplete final line found on 'M:\textmine/slr.txt'   > x <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix"   B. Using package(s) other than tm    Once again, thank you very much for the time you have...
2013 Jan 08
1
tm: custom reader for readPlain
Hello: I have a series of newspaper articles from a Canadian newspaper database (Canadian Newsstand) that look just like below. I've read through this vignette (http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf) about creating a custom reader to extract meta-data, but I can't understand how to apply this in the context of a text document, rather than in the tabular format
2019 Feb 12
7
Leer un txt a trozos
...ue me gustaría decirle a R es "ves a donde pone time y tráete X lineas" o "ves a donde pone time y tráete lineas hasta que llegues a end" En realidad debe ser bastante fácil, todas las tablas empiezan con time y acaban con end y tienen el mismo numero de filas. He estado mirando readPlain(), scan(), readfile()... pero le puedes decir cuantas lineas leer pero no donde empezar... creo. ¿Alguna pista de por donde puedo empezar a mirar? Muchas gracias. -- Jaume Tormo. https://es.linkedin.com/in/jaumetormo https://acercad.wordpress.com/ [[alternative HTML version deleted]]
2010 Feb 04
1
How to read HTML or TEXT file with tm package
??????????????????????????????????????????... ????: ???? URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100204/a3069c99/attachment.pl>
2009 Jan 15
1
How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)
...ong increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus) > findAssocs(tdm, "city", 0.97) error:cannot allocate vector of size 1.1 Gb ------...
2012 Feb 29
1
TM reader with text
..."<U+FB01>nancement" "<U+FB01>nancier" "<U+FB01>nanci?re" "<U+FB01>nanci?res" "<U+FB01>nanciers" "<U+FB01>xe" Some french words are not well reading by TM with the reader readPlain. I try to use reader= reader PDF. But it doesn't work so I must transformed PDF text to text. And some words are not understand so when I use TermDocumentMatrix a word like inflation diseappear. It's a big probleme for me. I spend lot of time on this problem, any idea ? Thank's for you...
2009 Jan 10
1
Help needed for Loading "tm" package
...eka.jar", "RWeka.jar"), package = pkgname, : Cannot create Java virtual machine (-1) Error : .onLoad failed in 'loadNamespace' for 'RWeka' Error: package 'RWeka' could not be loaded > my.corpurs <-Corpus(DirSource(my.path), readerControl = list(reader=readPlain)) Error: could not find function "Corpus" > my.tdm <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" > my.tdm[1,] Error: object "my.tdm" not found -- Kum-Hoe Hwang, Ph.D. Phone : 82-31-250-3516 Email : phdhwang@gmail.com [[altern...
2009 Oct 15
1
Problems with rJava and tm packages
...onLoad failed in 'loadNamespace' for 'rJava' Error: package/namespace load failed for 'rJava' > > #Set documents directory > DIR <- "G:/TextSearch/Speeches" > > #Load corpus > speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain, + language = "en_US", load = TRUE)) > > #Remove stopwords > speech <- tmMap(speech, stripWhitespace) > speech A corpus with 2 text documents > tdm<-TermDocumentMatrix(speech) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determine...
2009 Mar 30
1
Help with tm assocation analysis and Rgraphviz installation.
...1’ . I tried other terms, and no association value is less than 1, which obviously is wrong. Could any export tell me where did I do wrong? My R-code is: R>my.path<-'C:\\textfile' R>library(tm) R>my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) R>tdmO <- TermDocMatrix(my.corpus) R>tdmO An object of class “TermDocMatrix” Slot "Data": 2 x 1426 sparse Matrix of class "dgCMatrix" [[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]] 1 3 1 12 1 1 1 8 1 1 2 1 9 . 2 2 1 518 1 1 1 2 1 1 2 6 1...
2009 Dec 11
0
readHTML within tm package
...t routine I get an error. When I run getReaders (below) readHTML isn't listed. > getReaders() [1] "readDOC" "readGmane" [3] "readPDF" "readReut21578XML" [5] "readReut21578XMLasPlain" "readPlain" [7] "readRCV1" "readTabular" I'm a missing something? Is there an extra install I'm missing, or has the routine been removed or replaced? Thanks, Peter Oh, yes, running the latest R release on Mac OS 10.6.2 -- View this message in...
2011 Sep 05
0
Stemming functions only work on the last word of plain text documents
...n it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all).  An example: > path <- c("c:\path\to\directory")       # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are:   create_date creator Available variables in the data frame are:   MetaID $`1.txt` running runs runners $`2.txt` happyne...
2009 Jan 09
1
[R} how to build TermDocMatrix in tm text mining package of R
Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.
2009 Apr 17
0
question about the Text Mining package tm
...but I inserted a new line before every occurrence of http. I ran the following code: library("tm") my.path <- 'C:\\dataForR\\textsTweet1\\' my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv' (ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain, language = "la"))) Response from R: A text document collection with 3 text documents Warning message: In readLines(filename, encoding = encoding) : incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt' Then I ran the TermDocMatrix function. It is supposed to tak...
2011 Feb 10
2
Help using "tm" text mining package - preprocessing
Thanks all for your help. I fear text mining is an abstract little corner of "R". I have imported 3228 text (.txt) files, each a news story, into R using [tm]: textd <- Corpus(DirSource("other/docs"), readerControl = list(reader =readPlain)) I can pre-process each individual document using tolower(textd[[1]]) however, when I try to run tmTolower() I get a no such command error, and then the Term Document Matrix command gives me a peculiar error: > other.TDM <- TermDocumentMatrix(textd, control = list(stopwords = TRUE)) Er...
2013 Sep 26
0
R hangs at NGramTokenizer
...ary(tm)))> invisible(clusterEvalQ(cl, library(RWeka))) > invisible(clusterEvalQ(cl, library(topicmodels)))> invisible(clusterEvalQ(cl, library(RTextTools)))> myCorpus <-Corpus(DirSource("/home/neeph/Test/DMOZ_Business"), encoding="UTF-8", readerControl=list(reader=readPlain))> removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeURL)> removeAmp <- function(x) gsub("&amp;", "", x)> myCorpus <- tm_map(myCorpus, removeAmp)> removeWWW <- function(x) gsub(&quot...
2009 Oct 13
0
tm: Why does adding local metadata take so long?
...# Use that vector to create a DirSource object Dir_3compounds <- DirSource(dirName, pattern = "_.*\\.txt", ignore.case = TRUE, encoding = "latin1") # Read the .txt files into a volatile corpus Corpus_3compounds <- Corpus(Dir_3compounds, readerControl = list(reader = readPlain, language = "en", load = TRUE)) I have the metadata for these text documents in an Excel table, which I have read into Metadata_3compounds as follows: # Read the metadata into a data frame Metadata_3compounds <- read.xls("/Volumes/RDR Test Documents/ 3Compounds/3compounds...
2009 Jul 17
3
Ayuda con el paquete de text mining (TM)
Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones
2008 Jan 07
1
glibc detected *** /usr/lib64/R/bin/exec/R: double free or corruption ???? tm package
...t;6.1" $year [1] "2007" $month [1] "11" $day [1] "26" $`svn rev` [1] "43537" $language [1] "R" $version.string [1] "R version 2.6.1 (2007-11-26)" > test <- TextDocCol(DirSource(getwd()), readerControl = list(reader = readPlain, load = TRUE, language = "nl_BE")) *** glibc detected *** /usr/lib64/R/bin/exec/R: double free or corruption (!prev): 0x0000000022e20680 *** ======= Backtrace: ========= /lib64/libc.so.6[0x359946f4f4] /lib64/libc.so.6(cfree+0x8c)[0x3599472b1c] /usr/lib64/R/lib/libR.so[0x305b670a3d] /usr/l...
2009 Jan 15
2
Interface to open source Reporting tools
...pkgname, : > > Cannot create Java virtual machine (-1) > > Error : .onLoad failed in 'loadNamespace' for 'RWeka' > > Error: package 'RWeka' could not be loaded > >> my.corpurs <-Corpus(DirSource(my.path), readerControl = > > list(reader=readPlain)) > > Error: could not find function "Corpus" > >> my.tdm <- TermDocMatrix(my.corpus) > > Error: could not find function "TermDocMatrix" > >> my.tdm[1,] > > Error: object "my.tdm" not found > > > > > > -- > &g...