thr3ads.net - search: "readplain"

Displaying 20 results from an estimated 21 matches for "readplain".

2009 Oct 02

text mining

...us. I have searched the above documenet as well as related documentation. Any leads or help would be appreciated. Thanks everyone from document txt <- system.file("texts", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE))) my attempt txt <- system.file("Speeches/speech", "txt", package = "tm") (ovid <- Corpus(DirSource(txt), readerControl = list(reader = readPlain, language = "la", load = TRUE))) -- View this message in c...

Extracting information from text data

2011 Jan 24

Extracting information from text data

...data stored in different files. Where n = number of words (say w1, w2, …, wn). M is the number of documents (say d1, d2, …, dm) A. Using package tm I am using package tm to do the job. I have provided the code below: > my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) In readLines(y, encoding = x$Encoding) : incomplete final line found on 'M:\textmine/slr.txt' > x <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" B. Using package(s) other than tm Once again, thank you very much for the time you have...

tm: custom reader for readPlain

2013 Jan 08

tm: custom reader for readPlain

Hello: I have a series of newspaper articles from a Canadian newspaper database (Canadian Newsstand) that look just like below. I've read through this vignette (http://cran.r-project.org/web/packages/tm/vignettes/extensions.pdf) about creating a custom reader to extract meta-data, but I can't understand how to apply this in the context of a text document, rather than in the tabular format

Leer un txt a trozos

2019 Feb 12

Leer un txt a trozos

...ue me gustaría decirle a R es "ves a donde pone time y tráete X lineas" o "ves a donde pone time y tráete lineas hasta que llegues a end" En realidad debe ser bastante fácil, todas las tablas empiezan con time y acaban con end y tienen el mismo numero de filas. He estado mirando readPlain(), scan(), readfile()... pero le puedes decir cuantas lineas leer pero no donde empezar... creo. ¿Alguna pista de por donde puedo empezar a mirar? Muchas gracias. -- Jaume Tormo. https://es.linkedin.com/in/jaumetormo https://acercad.wordpress.com/ [[alternative HTML version deleted]]

How to read HTML or TEXT file with tm package

2010 Feb 04

How to read HTML or TEXT file with tm package

??????????????????????????????????????????... ????: ???? URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20100204/a3069c99/attachment.pl>

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

2009 Jan 15

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

...ong increasing a physical RAM, or doing other recipes, etc? ############################### ###### my R Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus) > findAssocs(tdm, "city", 0.97) error:cannot allocate vector of size 1.1 Gb ------...

TM reader with text

2012 Feb 29

TM reader with text

..."<U+FB01>nancement" "<U+FB01>nancier" "<U+FB01>nanci?re" "<U+FB01>nanci?res" "<U+FB01>nanciers" "<U+FB01>xe" Some french words are not well reading by TM with the reader readPlain. I try to use reader= reader PDF. But it doesn't work so I must transformed PDF text to text. And some words are not understand so when I use TermDocumentMatrix a word like inflation diseappear. It's a big probleme for me. I spend lot of time on this problem, any idea ? Thank's for you...

Help needed for Loading "tm" package

2009 Jan 10

Help needed for Loading "tm" package

...eka.jar", "RWeka.jar"), package = pkgname, : Cannot create Java virtual machine (-1) Error : .onLoad failed in 'loadNamespace' for 'RWeka' Error: package 'RWeka' could not be loaded > my.corpurs <-Corpus(DirSource(my.path), readerControl = list(reader=readPlain)) Error: could not find function "Corpus" > my.tdm <- TermDocMatrix(my.corpus) Error: could not find function "TermDocMatrix" > my.tdm[1,] Error: object "my.tdm" not found -- Kum-Hoe Hwang, Ph.D. Phone : 82-31-250-3516 Email : phdhwang@gmail.com [[altern...

Problems with rJava and tm packages

2009 Oct 15

Problems with rJava and tm packages

...onLoad failed in 'loadNamespace' for 'rJava' Error: package/namespace load failed for 'rJava' > > #Set documents directory > DIR <- "G:/TextSearch/Speeches" > > #Load corpus > speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain, + language = "en_US", load = TRUE)) > > #Remove stopwords > speech <- tmMap(speech, stripWhitespace) > speech A corpus with 2 text documents > tdm<-TermDocumentMatrix(speech) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determine...

Help with tm assocation analysis and Rgraphviz installation.

2009 Mar 30

Help with tm assocation analysis and Rgraphviz installation.

...1’ . I tried other terms, and no association value is less than 1, which obviously is wrong. Could any export tell me where did I do wrong? My R-code is: R>my.path<-'C:\\textfile' R>library(tm) R>my.corpus <- Corpus(DirSource(my.path), readerControl = list (reader=readPlain)) R>tdmO <- TermDocMatrix(my.corpus) R>tdmO An object of class “TermDocMatrix” Slot "Data": 2 x 1426 sparse Matrix of class "dgCMatrix" [[ suppressing 1426 column names ‘000’, ‘0092’, ‘0093’ ... ]] 1 3 1 12 1 1 1 8 1 1 2 1 9 . 2 2 1 518 1 1 1 2 1 1 2 6 1...

readHTML within tm package

2009 Dec 11

readHTML within tm package

...t routine I get an error. When I run getReaders (below) readHTML isn't listed. > getReaders() [1] "readDOC" "readGmane" [3] "readPDF" "readReut21578XML" [5] "readReut21578XMLasPlain" "readPlain" [7] "readRCV1" "readTabular" I'm a missing something? Is there an extra install I'm missing, or has the routine been removed or replaced? Thanks, Peter Oh, yes, running the latest R release on Mac OS 10.6.2 -- View this message in...

Stemming functions only work on the last word of plain text documents

2011 Sep 05

Stemming functions only work on the last word of plain text documents

...n it only stems the last word of each document (The problem is the for wordStem and stemDocument does not work at all). An example: > path <- c("c:\path\to\directory") # collection of plain text documents > corp <- Corpus(DirSource(path), readerControl = list(reader = readPlain, language = "en_US" , load = T)) > inspect(corp) A corpus with 2 text documents The metadata consists of 2 tag-value pairs and a data frame Available tags are: create_date creator Available variables in the data frame are: MetaID $`1.txt` running runs runners $`2.txt` happyne...

[R} how to build TermDocMatrix in tm text mining package of R

2009 Jan 09

[R} how to build TermDocMatrix in tm text mining package of R

Howdy Gurus I 'd like to ask a question about how to build TermDocMatrix in tm text mining package. It is not clear about importing a plain text file, and them converting that text file into TermDocMatrix file, etc to me. How can I build a TermDocMatrix of " a plain text document file for text association? Or are there any good manuals? Thank you in advance, -- Kum-Hoe Hwang, Ph.D.

question about the Text Mining package tm

2009 Apr 17

question about the Text Mining package tm

...but I inserted a new line before every occurrence of http. I ran the following code: library("tm") my.path <- 'C:\\dataForR\\textsTweet1\\' my.path.csv<-'C:\\dataForR\\textsTweet1\\myTextFile.csv' (ovid <- Corpus(DirSource(my.path), readerControl = list(reader = readPlain, language = "la"))) Response from R: A text document collection with 3 text documents Warning message: In readLines(filename, encoding = encoding) : incomplete final line found on 'C:\dataForR\textsTweet1\/short.txt' Then I ran the TermDocMatrix function. It is supposed to tak...

Help using "tm" text mining package - preprocessing

2011 Feb 10

Help using "tm" text mining package - preprocessing

Thanks all for your help. I fear text mining is an abstract little corner of "R". I have imported 3228 text (.txt) files, each a news story, into R using [tm]: textd <- Corpus(DirSource("other/docs"), readerControl = list(reader =readPlain)) I can pre-process each individual document using tolower(textd[[1]]) however, when I try to run tmTolower() I get a no such command error, and then the Term Document Matrix command gives me a peculiar error: > other.TDM <- TermDocumentMatrix(textd, control = list(stopwords = TRUE)) Er...

R hangs at NGramTokenizer

2013 Sep 26

R hangs at NGramTokenizer

...ary(tm)))> invisible(clusterEvalQ(cl, library(RWeka))) > invisible(clusterEvalQ(cl, library(topicmodels)))> invisible(clusterEvalQ(cl, library(RTextTools)))> myCorpus <-Corpus(DirSource("/home/neeph/Test/DMOZ_Business"), encoding="UTF-8", readerControl=list(reader=readPlain))> removeURL <- function(x) gsub("http[[:alnum:]]*", "", x)> myCorpus <- tm_map(myCorpus, removeURL)> removeAmp <- function(x) gsub("&", "", x)> myCorpus <- tm_map(myCorpus, removeAmp)> removeWWW <- function(x) gsub(&quot...

tm: Why does adding local metadata take so long?

2009 Oct 13

tm: Why does adding local metadata take so long?

...# Use that vector to create a DirSource object Dir_3compounds <- DirSource(dirName, pattern = "_.*\\.txt", ignore.case = TRUE, encoding = "latin1") # Read the .txt files into a volatile corpus Corpus_3compounds <- Corpus(Dir_3compounds, readerControl = list(reader = readPlain, language = "en", load = TRUE)) I have the metadata for these text documents in an Excel table, which I have read into Metadata_3compounds as follows: # Read the metadata into a data frame Metadata_3compounds <- read.xls("/Volumes/RDR Test Documents/ 3Compounds/3compounds...

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

glibc detected *** /usr/lib64/R/bin/exec/R: double free or corruption ???? tm package

2008 Jan 07

glibc detected *** /usr/lib64/R/bin/exec/R: double free or corruption ???? tm package

...t;6.1" $year [1] "2007" $month [1] "11" $day [1] "26" $`svn rev` [1] "43537" $language [1] "R" $version.string [1] "R version 2.6.1 (2007-11-26)" > test <- TextDocCol(DirSource(getwd()), readerControl = list(reader = readPlain, load = TRUE, language = "nl_BE")) *** glibc detected *** /usr/lib64/R/bin/exec/R: double free or corruption (!prev): 0x0000000022e20680 *** ======= Backtrace: ========= /lib64/libc.so.6[0x359946f4f4] /lib64/libc.so.6(cfree+0x8c)[0x3599472b1c] /usr/lib64/R/lib/libR.so[0x305b670a3d] /usr/l...

Interface to open source Reporting tools

2009 Jan 15

Interface to open source Reporting tools

...pkgname, : > > Cannot create Java virtual machine (-1) > > Error : .onLoad failed in 'loadNamespace' for 'RWeka' > > Error: package 'RWeka' could not be loaded > >> my.corpurs <-Corpus(DirSource(my.path), readerControl = > > list(reader=readPlain)) > > Error: could not find function "Corpus" > >> my.tdm <- TermDocMatrix(my.corpus) > > Error: could not find function "TermDocMatrix" > >> my.tdm[1,] > > Error: object "my.tdm" not found > > > > > > -- > &g...

search for: readplain