thr3ads.net - search: "documenttermmatrix"

Displaying 13 results from an estimated 13 matches for "documenttermmatrix".

2010 Oct 11

topicmodels error

...[1] "TermDocumentMatrix" "simple_triplet_matrix" I try to use a matrix... but don't work: > MAT <- as.matrix(TDM) > Error in LDA(MAT, k = k, method = "Gibbs", control = list(seed = SEED, : > x is of class ?matrix? The help say is correct to use a DocumentTermMatrix: > Arguments > x Object of class "DocumentTermMatrix" Can anyone help me? Thanks

DocumentTermMatrix error

2011 May 21

DocumentTermMatrix error

Hi all, I have tried to create a DocumentTermMatrix with a tm package, but i get this error : Error in tolower(txt) : invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs' I tried doing this as it is showed in : http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining), with this...

DocumentTermMatrix - text minig

2011 May 20

DocumentTermMatrix - text minig

Hi All, I have a Data.frame that looks like that one below. I would like to do some text mining on it to possibly find some patterns between Opis, ACklasifikacija and Vodja. I looked over a tm package which loks promissing, more specifically DocumentTermMatrix or TermDocumentMatrix. But I can not figure out how to change my data from data.frame to Corpus or VCorpus. Globina ACKlasifikacija Opis GlobinaOd GlobinaDo Vodja 3671 8...

package "tm" fails to remove "the" with remove stopwords

2009 Nov 12

package "tm" fails to remove "the" with remove stopwords

...;- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.mat Terms Docs falls fetch hill jack jill mainly pail plain rain ran spain the water 1 0 0 0 0 0 0 0 0 1 0 1 1 0 2 1 0 0 0 0 1 0 1 0 0 0 0...

tm package- remove stowords failling

2010 Mar 31

tm package- remove stowords failling

Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML version deleted]]

R hangs at NGramTokenizer

2013 Sep 26

R hangs at NGramTokenizer

...removeNumbers)> myCorpus <- tm_map(myCorpus, removePunctuation)> myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"))> myCorpus <- tm_map(myCorpus, removeWords, stopwords("SMART"))> myCorpus <- tm_map(myCorpus, stripWhitespace)> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordLengths = c(1,Inf))) Everything works fine upto this stage, if I do not include tokenizing. However, when I run the code with the following alteration:> dictCorpus <- myCorpus> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordlengths=c(1,Inf),tokeniz...

how to check the accuracy for maxent ?

2013 Oct 08

how to check the accuracy for maxent ?

...ran.r-project.org/web/packages/maxent/maxent.pdf # LOAD LIBRARY library(maxent) # READ THE DATA, PREPARE THE CORPUS, and CREATE THE MATRIX data <- read.csv(system.file("data/NYTimes.csv.gz",package="maxent")) corpus <- Corpus(VectorSource(data$Title[1:150])) matrix <- DocumentTermMatrix(corpus) # TRAIN/PREDICT USING SPARSEM REPRESENTATION sparse <- as.compressed.matrix(matrix) model <- maxent(sparse[1:100,],data$Topic.Code[1:100]) results <- predict(model,sparse[101:150,]) Any idea how I can check the accuracy wrt the classification present in : data$Topic.Code ? I see...

Tamaño de la matriz de términos y memoria. Paquete TM

2012 Dec 13

Tamaño de la matriz de términos y memoria. Paquete TM

...lt;- tm_map(corpus, removeWords, stopwords("spanish")) # stemming corpus <- tm_map(corpus, stemDocument, language = "spanish") # crea matriz de terminos #a) términos como filas y documentos como columnas dtm <- DocumentTermMatrix(corpus) inspect(dtm[1000:1005,1000:1005]) # Términos con frecuencia mínima igual a 30: findFreqTerms(dtm, lowfreq=30) # remueve términos con baja frecuencia inspect(removeSparseTerms(dtm, 0.4)) # nube de palabras m <-...

tm package

2010 Feb 16

tm package

...t(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumbers) reuters21578.dtm <- DocumentTermMatrix(reuters21578) that reuters21578.dtm does not include terms from the Heading (e.g. the Title). I'm wondering if anyone can confirm this and if so, is there an option to have the terms from the Heading included? Many thanks! Cheers, David

findAssocs()

2011 Sep 26

findAssocs()

I am trying to find the math behind the "tm" package findAssocs() ?findAssocs does not say anything besides "association" and "correlate" Usually entering "findAssocs" at the CLI gives the code for a R function, but in this case I obtain: function (x, term, corlimit) UseMethod("findAssocs", x) <environment: namespace:tm> Any ideas?

Document Term Matrix

2018 Jan 05

Document Term Matrix

Hi, Does anyone know what is maximal term length in Document Term Matrix? <<DocumentTermMatrix (documents: 255, terms: 858)>> Non-/sparse entries: 8081/210709 Sparsity : 96% Maximal term length: 12 Weighting : term frequency (tf) Thanks for any help! Elahe

count number of stop words in R

2017 Jun 12

count number of stop words in R

You can use regular expressions. ?regex and/or the stringr package are good places to start. Of course, you have to define "stop words." Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 5:40

count number of stop words in R

2017 Jun 12

count number of stop words in R

Hi all, Is there a way in R to count the number of stop words (English) of a string using tm package? str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the

search for: documenttermmatrix