Displaying 13 results from an estimated 13 matches for "documenttermmatrix".
2010 Oct 11
2
topicmodels error
...[1] "TermDocumentMatrix" "simple_triplet_matrix"
I try to use a matrix... but don't work:
> MAT <- as.matrix(TDM)
> Error in LDA(MAT, k = k, method = "Gibbs", control = list(seed = SEED, :
> x is of class ?matrix?
The help say is correct to use a DocumentTermMatrix:
> Arguments
> x Object of class "DocumentTermMatrix"
Can anyone help me?
Thanks
2011 May 21
1
DocumentTermMatrix error
Hi all,
I have tried to create a DocumentTermMatrix with a tm package, but i get this error :
Error in tolower(txt) :
invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
I tried doing this as it is showed in :
http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text Mining),
with this...
2011 May 20
1
DocumentTermMatrix - text minig
Hi All,
I have a Data.frame that looks like that one below. I would like to do some text mining on it to possibly find some patterns between Opis, ACklasifikacija and Vodja. I looked over a tm package which loks promissing, more specifically DocumentTermMatrix or TermDocumentMatrix. But I can not figure out how to change my data from data.frame to Corpus or VCorpus.
Globina ACKlasifikacija Opis GlobinaOd GlobinaDo Vodja
3671 8...
2009 Nov 12
2
package "tm" fails to remove "the" with remove stopwords
...;- tm_map(text.corp, stripWhitespace)
text.corp <- tm_map(text.corp, removeNumbers)
text.corp <- tm_map(text.corp, removePunctuation)
## text.corp <- tm_map(text.corp, stemDocument)
text.corp <- tm_map(text.corp, removeWords, c("the", stopwords("english")))
dtm <- DocumentTermMatrix(text.corp)
dtm
dtm.mat <- as.matrix(dtm)
dtm.mat
> dtm.mat
Terms
Docs falls fetch hill jack jill mainly pail plain rain ran spain the water
1 0 0 0 0 0 0 0 0 1 0 1 1 0
2 1 0 0 0 0 1 0 1 0 0 0 0...
2010 Mar 31
1
tm package- remove stowords failling
Hi,
I just noticed that by inspecting the matrix term that no all stopwords are
removed, does someone know how to fix that?
library(tm)
data("crude")
d<-tm_map(crude, removeWords, stopwords(language='english'))
dt<-DocumentTermMatrix(d,control=list(minWordLength=3, minDocFreq=2))
inspect( dt)
I am using R version 2.10, tm package 0.5-3
cheers
Welma
[[alternative HTML version deleted]]
2013 Sep 26
0
R hangs at NGramTokenizer
...removeNumbers)> myCorpus <- tm_map(myCorpus, removePunctuation)> myCorpus <- tm_map(myCorpus, removeWords, stopwords("english"))> myCorpus <- tm_map(myCorpus, removeWords, stopwords("SMART"))> myCorpus <- tm_map(myCorpus, stripWhitespace)> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordLengths = c(1,Inf)))
Everything works fine upto this stage, if I do not include tokenizing. However, when I run the code with the following alteration:> dictCorpus <- myCorpus> myDtm <- DocumentTermMatrix(myCorpus, control = list(wordlengths=c(1,Inf),tokeniz...
2013 Oct 08
1
how to check the accuracy for maxent ?
...ran.r-project.org/web/packages/maxent/maxent.pdf
# LOAD LIBRARY
library(maxent)
# READ THE DATA, PREPARE THE CORPUS, and CREATE THE MATRIX
data <- read.csv(system.file("data/NYTimes.csv.gz",package="maxent"))
corpus <- Corpus(VectorSource(data$Title[1:150]))
matrix <- DocumentTermMatrix(corpus)
# TRAIN/PREDICT USING SPARSEM REPRESENTATION
sparse <- as.compressed.matrix(matrix)
model <- maxent(sparse[1:100,],data$Topic.Code[1:100])
results <- predict(model,sparse[101:150,])
Any idea how I can check the accuracy wrt the classification present in :
data$Topic.Code ?
I see...
2012 Dec 13
2
Tamaño de la matriz de términos y memoria. Paquete TM
...lt;- tm_map(corpus, removeWords, stopwords("spanish"))
# stemming
corpus <- tm_map(corpus, stemDocument, language = "spanish")
# crea matriz de terminos
#a) términos como filas y documentos como columnas
dtm <- DocumentTermMatrix(corpus)
inspect(dtm[1000:1005,1000:1005])
# Términos con frecuencia mínima igual a 30:
findFreqTerms(dtm, lowfreq=30)
# remueve términos con baja frecuencia
inspect(removeSparseTerms(dtm, 0.4))
# nube de palabras
m <-...
2010 Feb 16
0
tm package
...t(reader = readReut21578XMLasPlain))
reuters21578 <- tm_map(reuters21578, stripWhitespace)
reuters21578 <- tm_map(reuters21578, tolower)
reuters21578 <- tm_map(reuters21578, removePunctuation)
reuters21578 <- tm_map(reuters21578, removeNumbers)
reuters21578.dtm <- DocumentTermMatrix(reuters21578)
that reuters21578.dtm does not include terms from the Heading (e.g. the Title).
I'm wondering if anyone can confirm this and if so, is there an option
to have the terms from the Heading included?
Many thanks!
Cheers,
David
2011 Sep 26
2
findAssocs()
I am trying to find the math behind the "tm" package findAssocs()
?findAssocs does not say anything besides "association" and "correlate"
Usually entering "findAssocs" at the CLI gives the code for a R
function, but in this case I obtain:
function (x, term, corlimit)
UseMethod("findAssocs", x)
<environment: namespace:tm>
Any ideas?
2018 Jan 05
0
Document Term Matrix
Hi,
Does anyone know what is maximal term length in Document Term Matrix?
<<DocumentTermMatrix (documents: 255, terms: 858)>>
Non-/sparse entries: 8081/210709
Sparsity : 96%
Maximal term length: 12
Weighting : term frequency (tf)
Thanks for any help!
Elahe
2017 Jun 12
0
count number of stop words in R
You can use regular expressions.
?regex and/or the stringr package are good places to start. Of
course, you have to define "stop words."
Cheers,
Bert
Bert Gunter
"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
On Mon, Jun 12, 2017 at 5:40
2017 Jun 12
3
count number of stop words in R
Hi all,
Is there a way in R to count the number of stop words (English) of a string using tm package?
str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the