What is in docs? What does inspect(docs) say? --Ista On Tue, Dec 6, 2016 at 9:29 AM, Patrick Casimir <patrcasi at nova.edu> wrote:> Thanks Ista. See codes below. I am not sure why the DTM is showing 0 term. I > have 4 documents in the corpus. And I was able to make transformations > > to the documents inside the corpus. > > >> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") >> dir(cname) > [1] "case1.txt" "case2.txt" "case3.txt" "case4.txt" >> library(tm) >> docs <- Corpus(DirSource(cname)) >> install.packages("magrittr" ,dependencies=TRUE) >> viewDocs <- function(d, n) {d %>% extract2(n) %>% as.character() %>% >> writeLines()} >> viewDocs(docs, 1) >> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x)) >> docs <- tm_map(docs, toSpace, "/|@|nn|") >> inspect(docs[1]) >> docs <- tm_map(docs, removePunctuation) >> docs <- tm_map(docs, removeWords, stopwords("english")) >> inspect(docs[1]) >> docs <- tm_map(docs, stripWhitespace) >> docs <- tm_map(docs, stemDocument) >> dtm <- DocumentTermMatrix(docs) >> dtm > <<DocumentTermMatrix (documents: 4, terms: 0)>> > Non-/sparse entries: 0/0 > Sparsity : 100% > Maximal term length: 0 > Weighting : term frequency (tf) >> > > > > > ________________________________ > From: Ista Zahn <istazahn at gmail.com> > Sent: Tuesday, December 6, 2016 9:09:37 AM > To: Patrick Casimir > Cc: r-help at r-project.org > Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? > > > Hi Patrick, > > How could anyone possibly answer this question with only the information > you've provided? It's like showing me an empty cup and asking why it's > empty. Maybe you didn't put anything in it. Maybe you did and then you dog > drank it or your cat knocked it over or your girlfriend drank it. How would > I possibly know? > > Bottom line, you need to show exactly what you did to produce that result, > preferably in the form of a few lines of code that we can run to reproduce > your problem. > > Finally, you may find it helpful take some time to learn how to ask > questions the smart way. http://catb.org/~esr/faqs/smart-questions.html is a > good place to learn this important skill. > > Best, > Ista > > > On Dec 6, 2016 7:58 AM, "Patrick Casimir" <patrcasi at nova.edu> wrote: > > <<DocumentTermMatrix (documents: 4, terms: 0)>> > Non-/sparse entries: 0/0 > Sparsity : 100% > Maximal term length: 0 > Weighting : term frequency (tf) > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
docs has 4 documents and inspect(docs) shows 4 plaintextdocument> summary(docs)Length Class Mode case1.txt 2 PlainTextDocument list case2.txt 2 PlainTextDocument list case3.txt 2 PlainTextDocument list case4.txt 2 PlainTextDocument list> inspect(docs)<<VCorpus>> Metadata: corpus specific: 0, document level (indexed): 0 Content: documents: 4 [[1]] <<PlainTextDocument>> Metadata: 7 Content: chars: 4564 [[2]] <<PlainTextDocument>> Metadata: 7 Content: chars: 9312 [[3]] <<PlainTextDocument>> Metadata: 7 Content: chars: 1388 [[4]] <<PlainTextDocument>> Metadata: 7 Content: chars: 2366 ________________________________ From: Ista Zahn <istazahn at gmail.com> Sent: Tuesday, December 6, 2016 10:08:28 AM To: Patrick Casimir Cc: r-help at r-project.org Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? What is in docs? What does inspect(docs) say? --Ista On Tue, Dec 6, 2016 at 9:29 AM, Patrick Casimir <patrcasi at nova.edu> wrote:> Thanks Ista. See codes below. I am not sure why the DTM is showing 0 term. I > have 4 documents in the corpus. And I was able to make transformations > > to the documents inside the corpus. > > >> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") >> dir(cname) > [1] "case1.txt" "case2.txt" "case3.txt" "case4.txt" >> library(tm) >> docs <- Corpus(DirSource(cname)) >> install.packages("magrittr" ,dependencies=TRUE) >> viewDocs <- function(d, n) {d %>% extract2(n) %>% as.character() %>% >> writeLines()} >> viewDocs(docs, 1) >> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", x)) >> docs <- tm_map(docs, toSpace, "/|@|nn|") >> inspect(docs[1]) >> docs <- tm_map(docs, removePunctuation) >> docs <- tm_map(docs, removeWords, stopwords("english")) >> inspect(docs[1]) >> docs <- tm_map(docs, stripWhitespace) >> docs <- tm_map(docs, stemDocument) >> dtm <- DocumentTermMatrix(docs) >> dtm > <<DocumentTermMatrix (documents: 4, terms: 0)>> > Non-/sparse entries: 0/0 > Sparsity : 100% > Maximal term length: 0 > Weighting : term frequency (tf) >> > > > > > ________________________________ > From: Ista Zahn <istazahn at gmail.com> > Sent: Tuesday, December 6, 2016 9:09:37 AM > To: Patrick Casimir > Cc: r-help at r-project.org > Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? > > > Hi Patrick, > > How could anyone possibly answer this question with only the information > you've provided? It's like showing me an empty cup and asking why it's > empty. Maybe you didn't put anything in it. Maybe you did and then you dog > drank it or your cat knocked it over or your girlfriend drank it. How would > I possibly know? > > Bottom line, you need to show exactly what you did to produce that result, > preferably in the form of a few lines of code that we can run to reproduce > your problem. > > Finally, you may find it helpful take some time to learn how to ask > questions the smart way. http://catb.org/~esr/faqs/smart-questions.html is a > good place to learn this important skill. > > Best, > Ista > > > On Dec 6, 2016 7:58 AM, "Patrick Casimir" <patrcasi at nova.edu> wrote: > > <<DocumentTermMatrix (documents: 4, terms: 0)>> > Non-/sparse entries: 0/0 > Sparsity : 100% > Maximal term length: 0 > Weighting : term frequency (tf) > > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
Does
cname <- file.path("C:\\Users\\Desktop\\Text
Mining\\Cases\\MyCorpus")
docs <- Corpus(DirSource(cname))
dtm <- DocumentTermMatrix(docs)
dtm
work?
If so, add start adding back your tm_map until you find the thing that
breaks it.
Best,
Ista
On Tue, Dec 6, 2016 at 10:25 AM, Patrick Casimir <patrcasi at nova.edu>
wrote:>
> docs has 4 documents and inspect(docs) shows 4 plaintextdocument
>
>
>> summary(docs)
> Length Class Mode
> case1.txt 2 PlainTextDocument list
> case2.txt 2 PlainTextDocument list
> case3.txt 2 PlainTextDocument list
> case4.txt 2 PlainTextDocument list
>
>> inspect(docs)
> <<VCorpus>>
> Metadata: corpus specific: 0, document level (indexed): 0
> Content: documents: 4
>
> [[1]]
> <<PlainTextDocument>>
> Metadata: 7
> Content: chars: 4564
>
> [[2]]
> <<PlainTextDocument>>
> Metadata: 7
> Content: chars: 9312
>
> [[3]]
> <<PlainTextDocument>>
> Metadata: 7
> Content: chars: 1388
>
> [[4]]
> <<PlainTextDocument>>
> Metadata: 7
> Content: chars: 2366
>
>
>
> ________________________________
> From: Ista Zahn <istazahn at gmail.com>
> Sent: Tuesday, December 6, 2016 10:08:28 AM
>
> To: Patrick Casimir
> Cc: r-help at r-project.org
> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term?
>
> What is in docs?
>
> What does
>
> inspect(docs)
>
> say?
>
> --Ista
>
>
>
> On Tue, Dec 6, 2016 at 9:29 AM, Patrick Casimir <patrcasi at
nova.edu> wrote:
>> Thanks Ista. See codes below. I am not sure why the DTM is showing 0
term.
>> I
>> have 4 documents in the corpus. And I was able to make transformations
>>
>> to the documents inside the corpus.
>>
>>
>>> cname <- file.path("C:\\Users\\Desktop\\Text
Mining\\Cases\\MyCorpus")
>>> dir(cname)
>> [1] "case1.txt" "case2.txt" "case3.txt"
"case4.txt"
>>> library(tm)
>>> docs <- Corpus(DirSource(cname))
>>> install.packages("magrittr" ,dependencies=TRUE)
>>> viewDocs <- function(d, n) {d %>% extract2(n) %>%
as.character() %>%
>>> writeLines()}
>>> viewDocs(docs, 1)
>>> toSpace <- content_transformer(function(x, pattern)
gsub(pattern, " ",
>>> x))
>>> docs <- tm_map(docs, toSpace, "/|@|nn|")
>>> inspect(docs[1])
>>> docs <- tm_map(docs, removePunctuation)
>>> docs <- tm_map(docs, removeWords,
stopwords("english"))
>>> inspect(docs[1])
>>> docs <- tm_map(docs, stripWhitespace)
>>> docs <- tm_map(docs, stemDocument)
>>> dtm <- DocumentTermMatrix(docs)
>>> dtm
>> <<DocumentTermMatrix (documents: 4, terms: 0)>>
>> Non-/sparse entries: 0/0
>> Sparsity : 100%
>> Maximal term length: 0
>> Weighting : term frequency (tf)
>>>
>>
>>
>>
>>
>> ________________________________
>> From: Ista Zahn <istazahn at gmail.com>
>> Sent: Tuesday, December 6, 2016 9:09:37 AM
>> To: Patrick Casimir
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term?
>>
>>
>> Hi Patrick,
>>
>> How could anyone possibly answer this question with only the
information
>> you've provided? It's like showing me an empty cup and asking
why it's
>> empty. Maybe you didn't put anything in it. Maybe you did and then
you dog
>> drank it or your cat knocked it over or your girlfriend drank it. How
>> would
>> I possibly know?
>>
>> Bottom line, you need to show exactly what you did to produce that
result,
>> preferably in the form of a few lines of code that we can run to
reproduce
>> your problem.
>>
>> Finally, you may find it helpful take some time to learn how to ask
>> questions the smart way. http://catb.org/~esr/faqs/smart-questions.html
is
>> a
>> good place to learn this important skill.
>>
>> Best,
>> Ista
>>
>>
>> On Dec 6, 2016 7:58 AM, "Patrick Casimir" <patrcasi at
nova.edu> wrote:
>>
>> <<DocumentTermMatrix (documents: 4, terms: 0)>>
>> Non-/sparse entries: 0/0
>> Sparsity : 100%
>> Maximal term length: 0
>> Weighting : term frequency (tf)
>>
>>
>>
>>
>>
>>
>> [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>