thr3ads.net - search: "stripwhitespace"

Displaying 20 results from an estimated 22 matches for "stripwhitespace".

Strange (non-deterministic) problem with strsplit

2004 Jul 16

Strange (non-deterministic) problem with strsplit

...chr "a*[square box]" (square box not reproduced here because copy and pasting it seems to break my web mail) Can anyone reproduce the problem and/or suggest any solutions? parseFormula <- function(formula) { splitvars <- function(x) { strsplit(x, "\\+|\\*")[[1]] } stripwhitespace <- function(x) { gsub("\\s", "", x, perl=T) } vars <- stripwhitespace(as.character(formula)[3]) varsplit <- strsplit(vars, "|", fixed=TRUE)[[1]] parts <- list( y = stripwhitespace(as.character(formula)[2]), x = varsplit[1], g = varsplit[2] )...

Strange (non-deterministic) problem with strsplit

2004 Jul 16

Strange (non-deterministic) problem with strsplit

Problems with rJava and tm packages

2009 Oct 15

Problems with rJava and tm packages

...> > #Set documents directory > DIR <- "G:/TextSearch/Speeches" > > #Load corpus > speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain, + language = "en_US", load = TRUE)) > > #Remove stopwords > speech <- tmMap(speech, stripWhitespace) > speech A corpus with 2 text documents > tdm<-TermDocumentMatrix(speech) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determined from the registry") : argument is of length zero Error: .onLoad failed in 'loadNamespace' for 'rJava...

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

...s! Mark require(tm) myDocument <- c("the rain in Spain", "falls mainly on the plain", "jack and jill ran up the hill", "to fetch a pail of water") text.corp <- Corpus(VectorSource(myDocument)) ######################### text.corp <- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dt...

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

2009 Jan 15

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

...Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus) > findAssocs(tdm, "city", 0.97) error:cannot allocate vector of size 1.1 Gb ------------------------------------------------------------- > ################################ Thanks for your p...

Function failure in tm

2013 Jan 15

Function failure in tm

...and class(mycorp[[1]] returns "PlainTextDocument" "TextDocument" "character" But now that I've got a corpsu, none of the transformation functions work at all. They all return the following error (with the respective function name) Error in UseMethod("stripWhitespace", x) : no applicable method for 'stripWhitespace' applied to an object of class "NULL" I haven't seen this error reported anywhere in the R-list archives. Does anyone have any suggestions? Yours, Simon Kiss P.S. The results of sessionInfo() are R version 2.15.0 (201...

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

2014 Jul 22

Ayuda Error in `colnames<-`(`*tmp*`, value = c(

...<-iconv(enc2utf8(d1), sub = "byte") > d2<-readLines(txt2, encoding="UTF-8") > d2<-iconv(enc2utf8(d2), sub = "byte") > df<-c(d1,d2) > corpus<-Corpus(VectorSource(df)) > d<-tm_map(corpus, content_transformer(tolower)) > d<-tm_map(d, stripWhitespace) > d<-tm_map(d, removePunctuation) > sw<-readLines("./StopWords.txt", encoding="UTF-8") > sw<-iconv(enc2utf8(sw), sub="byte") > d<-tm_map(d, removeWords, sw) > d<-tm_map(d, removeWords, stopwords("spanish")) > tdm<-TermDocu...

wordcloud y tabla de palabras [Avanzando]

2014 Jul 29

wordcloud y tabla de palabras [Avanzando]

...informe 2013"), row.names=c("2005", "2013")) ds<- DataframeSource(tmpText) ds<- DataframeSource(tmpinformes) corp = Corpus(ds) corp = tm_map(corp,removePunctuation) corp = tm_map(corp,content_transformer(tolower)) corp = tm_map(corp,removeNumbers) corp = tm_map(corp, stripWhitespace) corp = tm_map(corp, removeWords, sw) corp = tm_map(corp, removeWords, stopwords("spanish")) term.matrix<- TermDocumentMatrix(corp) term.matrix<- as.matrix(term.matrix) colnames(term.matrix) <- c("Año2005","Año2013") png(file="Org2005vs2013.png",heig...

Minería de texto

2012 Oct 25

Minería de texto

...e")) tw.corpus = tm_map(tw.corpus, tolower) tw.corpus = tm_map(tw.corpus, removePunctuation) tw.corpus = tm_map(tw.corpus, function(x) removeWords(x, c(stopwords("spanish"),"rt"))) tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords) tw.corpus = tm_map(tw.corpus, stripWhitespace) sw <- readLines("stopwords.es.txt",encoding="UTF-8") sw = iconv(sw, to="ASCII//TRANSLIT") tw.corpus = tm_map(tw.corpus, removeWords, sw) doc.m = TermDocumentMatrix(tw.corpus, control = list(minWordLength = 2)) dm = as.matrix(doc.m) # calculate the frequency o...

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

...F-8") txt = iconv(txt, to="ASCII//TRANSLIT") # construye un corpus corpus <- Corpus(VectorSource(txt)) # lleva a minúsculas corpus <- tm_map(corpus, tolower) # quita espacios en blanco corpus <- tm_map(corpus, stripWhitespace) # remueve la puntuación corpus <- tm_map(corpus, removePunctuation) # carga el archivo de palabras vacías personalizada en español y lo convierte a ASCII sw <- readLines("D:/Publico/Documents/TextMinigSpanishResources/Stopwords.es.txt",encoding=&quot...

wordcloud y tabla de palabras

2014 Jul 28

wordcloud y tabla de palabras

...rmes, pathname) { > > info.dir<-sprintf("%s/%s", pathname, informes) > > info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8")) > > info.cor.cl<-tm_map(info.cor, content_transformer(tolower)) > > info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) > > info.cor.cl<-tm_map(info.cor.cl,removePunctuation) > > sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") > > sw<-iconv(enc2utf8(sw), sub = "byte") > > info.cor.cl<-tm_map(info.cor.cl, removeWords, stopw...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 17

No es un problema de tm tienes doc.corpus vacío

...tolower)corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, > removeNumbers)corpus <- tm_map(corpus, removeWords, > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus <- > tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > stripWhitespace)inspect(doc.corpus[1:8])TDM <- > TermDocumentMatrix(corpus)TDM* > > por adelantado, muchas gracias!!! > > ruben! > ------------ próxima parte ------------ > Se ha borrado un adjunto en formato HTML... > URL: <https://stat.ethz.ch/pipermail/r-help-es/attachments/2014061...

wordcloud y tabla de palabras

2014 Jul 25

wordcloud y tabla de palabras

...formes/" >TDM<-function(informes, pathname) { info.dir<-sprintf("%s/%s", pathname, informes) info.cor<-Corpus(DirSource(directory=info.dir, encoding="UTF-8")) info.cor.cl<-tm_map(info.cor, content_transformer(tolower)) info.cor.cl<-tm_map(info.cor.cl, stripWhitespace) info.cor.cl<-tm_map(info.cor.cl,removePunctuation) sw<-readLines("C:/Users/d_2/Documents/StopWords.txt", encoding="UTF-8") sw<-iconv(enc2utf8(sw), sub = "byte") info.cor.cl<-tm_map(info.cor.cl, removeWords, stopwords("spanish")) info.tdm<...

tm package

2010 Feb 16

tm package

Hi, I'm using version 0.5.1 of tm package with R 2.10.1. It looks to me as if after the following reuters21578 <- Corpus(DirSource(corpusDir), readerControl = list(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumbers) reuters21578.dtm <- DocumentTermMatrix(reuters21578) that reuters21578.dtm does not include terms from the Heading...

Help with cleaning a corpus

2011 Apr 18

Help with cleaning a corpus

Hi! I created a corpus and I started to clean through this piece of code: txt <-tm_map(txt,removeWords, stopwords("spanish")) txt <-tm_map(txt,stripWhitespace) txt <-tm_map(txt,tolower) txt <-tm_map(txt,removeNumbers) txt <-tm_map(txt,removePunctuation) But something happpended: some of the documents in the corpus became empty, this is a problem when i try to make a document term matrix with tfidf. Is there any way to eliminate automatically...

R hangs at NGramTokenizer

2013 Sep 26

R hangs at NGramTokenizer

...; myCorpus <- tm_map(myCorpus, removeNumbers)> myCorpus <- tm_map(myCorpus, removePunctuation)> myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"))> myCorpus <- tm_map(myCorpus, removeWords, stopwords("SMART"))> myCorpus <- tm_map(myCorpus, stripWhitespace)> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordLengths = c(1,Inf))) Everything works fine upto this stage, if I do not include tokenizing. However, when I run the code with the following alteration:> dictCorpus <- myCorpus> myDtm <- DocumentTermMatrix(myCorpus, control...

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...vePunctuation)corpus <- tm_map(corpus, removeNumbers)corpus <- > >> tm_map(corpus, removeWords, > >> > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > >> TermDocumentMatrix(corpus)TDM* > >> > >> por adelantado, muchas gracias!!! > >> > >> ruben! > >> ------------ prÃ³xima parte ------------ Se ha borrado un adjunto en > >> formato HTML... > >&...

Problem with Snowball & RWeka

2011 Mar 24

Problem with Snowball & RWeka

Dear Forum, when I try to use SnowballStemmer() I get the following error message: "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException" It seems to have something to do with either Snowball or RWeka, however I can't figure out, what to do myself. If you could spend 5 minutes of your valuable time, to help me or give me a

[LLVMdev] teaching FileCheck to handle variations in order

2012 Sep 07

[LLVMdev] teaching FileCheck to handle variations in order

On 9/7/2012 12:12 PM, Krzysztof Parzyszek wrote: > On 9/7/2012 7:20 AM, Matthew Curtis wrote: >> >> The attached patch implements one possible solution. It introduces a >> position stack and a couple of directives: >> >> * 'CHECK-PUSH:' pushes the current match position onto the stack. >> * 'CHECK-POP:' pops the top value off of the stack

No es un problema de tm tienes doc.corpus vacío

2014 Jun 18

No es un problema de tm tienes doc.corpus vacío

...mbers)corpus <- > > > >> tm_map(corpus, removeWords, > > > >> > > > stopwords("english"))inspect(doc.corpus[1:2])library(SnowballC)corpus > > > >> <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, > > > >> stripWhitespace)inspect(doc.corpus[1:8])TDM <- > > > >> TermDocumentMatrix(corpus)TDM* > > > >> > > > >> por adelantado, muchas gracias!!! > > > >> > > > >> ruben! > > > >> ------------ prÃ³xima parte ------------ Se ha bo...

search for: stripwhitespace