thr3ads.net - search: "tmmap"

Displaying 4 results from an estimated 4 matches for "tmmap".

Did you mean: mmap

2009 Oct 15

Problems with rJava and tm packages

...39;rJava' > > #Set documents directory > DIR <- "G:/TextSearch/Speeches" > > #Load corpus > speech <- Corpus(DirSource(DIR), readerControl = list(reader = readPlain, + language = "en_US", load = TRUE)) > > #Remove stopwords > speech <- tmMap(speech, stripWhitespace) > speech A corpus with 2 text documents > tdm<-TermDocumentMatrix(speech) Error in if (!nchar(javahome)) stop("JAVA_HOME is not set and could not be determined from the registry") : argument is of length zero Error: .onLoad failed in 'loadNamespa...

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

2009 Jan 15

How to Solve the Error( error:cannot allocate vector of size 1.1 Gb)

...#### ###### my R Script's Outputs ###### ############################### > memory.limit(size = 2000) NULL > corpus.ko <- Corpus(DirSource("test_konews/"), + readerControl = list(reader = readPlain, + language = "UTF-8", load = FALSE)) > corpus.ko.nowhite <- tmMap(corpus.ko, stripWhitespace) > corpus <- tmMap(corpus.ko.nowhite, tmTolower) > tdm <- TermDocMatrix(corpus) > findAssocs(tdm, "city", 0.97) error:cannot allocate vector of size 1.1 Gb ------------------------------------------------------------- > #######################...

Ayuda con el paquete de text mining (TM)

2009 Jul 17

Ayuda con el paquete de text mining (TM)

Estimados, les escribo para consultar, lo siguiente: Estoy haciendo un trabajo de text mining y necesito importar una serie de textos para preprocesarlos, es decir eliminar los Stopwords, hacer stemming, eliminar signos de puntuación etc. Esto último lo puedo realizar con los datasets que trae la librería TM. Lo que no puedo lograr es importar texto desde algún medio a pesar que existe funciones

Efficiently Extracting Meta Data from TM Corpora

2009 Aug 13

Efficiently Extracting Meta Data from TM Corpora

I'm using text miner (the "tm" package) to process large numbers of blog and message board postings (about 245,000). Does anyone have any advice for how to efficiently extract the meta data from a corpus of this size? TM does a great job of using MPI for many functions (e.g. tmMap) which greatly speed up the processing. However, the "meta" function that I need does not take advantage of MPI. I have two ideas: 1) Find a way of running the meta function in parallel mode. Specifically, the code that I'm running is: urllist <- lapply(workingcorpus, meta, tag...

search for: tmmap