search for: documenttermmatrix

Displaying 13 results from an estimated 13 matches for "documenttermmatrix".

2010 Oct 11
2
topicmodels error
...[1] "TermDocumentMatrix" "simple_triplet_matrix" I try to use a matrix... but don't work: > MAT <- as.matrix(TDM) > Error in LDA(MAT, k = k, method = "Gibbs", control = list(seed = SEED, : > x is of class ?matrix? The help say is correct to use a DocumentTermMatrix: > Arguments > x Object of class "DocumentTermMatrix" Can anyone help me? Thanks
2011 May 21
1
DocumentTermMatrix error
Hi all, I have tried to create a DocumentTermMatrix with a tm package, but i get this error : Error in tolower(txt) : invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs' I tried doing this as it is showed in : http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining), with this...
2011 May 20
1
DocumentTermMatrix - text minig
Hi All, I have a Data.frame that looks like that one below. I would like to do some text mining on it to possibly find some patterns between Opis, ACklasifikacija and Vodja. I looked over a tm package which loks promissing, more specifically DocumentTermMatrix or TermDocumentMatrix. But I can not figure out how to change my data from data.frame to Corpus or VCorpus. Globina ACKlasifikacija Opis GlobinaOd GlobinaDo Vodja 3671 8...
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
...;- tm_map(text.corp, stripWhitespace) text.corp <- tm_map(text.corp, removeNumbers) text.corp <- tm_map(text.corp, removePunctuation) ## text.corp <- tm_map(text.corp, stemDocument) text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english"))) dtm <- DocumentTermMatrix(text.corp) dtm dtm.mat <- as.matrix(dtm) dtm.mat > dtm.mat Terms Docs falls fetch hill jack jill mainly pail plain rain ran spain the water 1 0 0 0 0 0 0 0 0 1 0 1 1 0 2 1 0 0 0 0 1 0 1 0 0 0 0...
2010 Mar 31
1
tm package- remove stowords failling
Hi, I just noticed that by inspecting the matrix term that no all stopwords are removed, does someone know how to fix that? library(tm) data("crude") d<-tm_map(crude, removeWords, stopwords(language='english')) dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2)) inspect( dt) I am using R version 2.10, tm package 0.5-3 cheers Welma [[alternative HTML version deleted]]
2013 Sep 26
0
R hangs at NGramTokenizer
...removeNumbers)> myCorpus <- tm_map(myCorpus, removePunctuation)> myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"))> myCorpus <- tm_map(myCorpus, removeWords, stopwords("SMART"))> myCorpus <- tm_map(myCorpus, stripWhitespace)> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordLengths = c(1,Inf))) Everything works fine upto this stage, if I do not include tokenizing. However, when I run the code with the following alteration:> dictCorpus <- myCorpus> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordlengths=c(1,Inf),tokeniz...
2013 Oct 08
1
how to check the accuracy for maxent ?
...ran.r-project.org/web/packages/maxent/maxent.pdf # LOAD LIBRARY library(maxent) # READ THE DATA, PREPARE THE CORPUS, and CREATE THE MATRIX data <- read.csv(system.file("data/NYTimes.csv.gz",package="maxent")) corpus <- Corpus(VectorSource(data$Title[1:150])) matrix <- DocumentTermMatrix(corpus) # TRAIN/PREDICT USING SPARSEM REPRESENTATION sparse <- as.compressed.matrix(matrix) model <- maxent(sparse[1:100,],data$Topic.Code[1:100]) results <- predict(model,sparse[101:150,]) Any idea how I can check the accuracy wrt the classification present in : data$Topic.Code ? I see...
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
...lt;- tm_map(corpus, removeWords, stopwords("spanish")) # stemming corpus <- tm_map(corpus, stemDocument, language = "spanish") # crea matriz de terminos #a) términos como filas y documentos como columnas dtm <- DocumentTermMatrix(corpus) inspect(dtm[1000:1005,1000:1005]) # Términos con frecuencia mínima igual a 30: findFreqTerms(dtm, lowfreq=30) # remueve términos con baja frecuencia inspect(removeSparseTerms(dtm, 0.4)) # nube de palabras m <-...
2010 Feb 16
0
tm package
...t(reader = readReut21578XMLasPlain)) reuters21578 <- tm_map(reuters21578, stripWhitespace) reuters21578 <- tm_map(reuters21578, tolower) reuters21578 <- tm_map(reuters21578, removePunctuation) reuters21578 <- tm_map(reuters21578, removeNumbers) reuters21578.dtm <- DocumentTermMatrix(reuters21578) that reuters21578.dtm does not include terms from the Heading (e.g. the Title). I'm wondering if anyone can confirm this and if so, is there an option to have the terms from the Heading included? Many thanks! Cheers, David
2011 Sep 26
2
findAssocs()
I am trying to find the math behind the "tm" package findAssocs() ?findAssocs does not say anything besides "association" and "correlate" Usually entering "findAssocs" at the CLI gives the code for a R function, but in this case I obtain: function (x, term, corlimit) UseMethod("findAssocs", x) <environment: namespace:tm> Any ideas?
2018 Jan 05
0
Document Term Matrix
Hi, Does anyone know what is maximal term length in Document Term Matrix? <<DocumentTermMatrix (documents: 255, terms: 858)>> Non-/sparse entries: 8081/210709 Sparsity : 96% Maximal term length: 12 Weighting : term frequency (tf) Thanks for any help! Elahe
2017 Jun 12
0
count number of stop words in R
You can use regular expressions. ?regex and/or the stringr package are good places to start. Of course, you have to define "stop words." Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 5:40
2017 Jun 12
3
count number of stop words in R
Hi all, Is there a way in R to count the number of stop words (English) of a string using tm package? str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the