Does cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") docs <- Corpus(DirSource(cname)) dtm <- DocumentTermMatrix(docs) dtm work? If so, add start adding back your tm_map until you find the thing that breaks it. Best, Ista On Tue, Dec 6, 2016 at 10:25 AM, Patrick Casimir <patrcasi at nova.edu> wrote:> > docs has 4 documents and inspect(docs) shows 4 plaintextdocument > > >> summary(docs) > Length Class Mode > case1.txt 2 PlainTextDocument list > case2.txt 2 PlainTextDocument list > case3.txt 2 PlainTextDocument list > case4.txt 2 PlainTextDocument list > >> inspect(docs) > <<VCorpus>> > Metadata: corpus specific: 0, document level (indexed): 0 > Content: documents: 4 > > [[1]] > <<PlainTextDocument>> > Metadata: 7 > Content: chars: 4564 > > [[2]] > <<PlainTextDocument>> > Metadata: 7 > Content: chars: 9312 > > [[3]] > <<PlainTextDocument>> > Metadata: 7 > Content: chars: 1388 > > [[4]] > <<PlainTextDocument>> > Metadata: 7 > Content: chars: 2366 > > > > ________________________________ > From: Ista Zahn <istazahn at gmail.com> > Sent: Tuesday, December 6, 2016 10:08:28 AM > > To: Patrick Casimir > Cc: r-help at r-project.org > Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? > > What is in docs? > > What does > > inspect(docs) > > say? > > --Ista > > > > On Tue, Dec 6, 2016 at 9:29 AM, Patrick Casimir <patrcasi at nova.edu> wrote: >> Thanks Ista. See codes below. I am not sure why the DTM is showing 0 term. >> I >> have 4 documents in the corpus. And I was able to make transformations >> >> to the documents inside the corpus. >> >> >>> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") >>> dir(cname) >> [1] "case1.txt" "case2.txt" "case3.txt" "case4.txt" >>> library(tm) >>> docs <- Corpus(DirSource(cname)) >>> install.packages("magrittr" ,dependencies=TRUE) >>> viewDocs <- function(d, n) {d %>% extract2(n) %>% as.character() %>% >>> writeLines()} >>> viewDocs(docs, 1) >>> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", >>> x)) >>> docs <- tm_map(docs, toSpace, "/|@|nn|") >>> inspect(docs[1]) >>> docs <- tm_map(docs, removePunctuation) >>> docs <- tm_map(docs, removeWords, stopwords("english")) >>> inspect(docs[1]) >>> docs <- tm_map(docs, stripWhitespace) >>> docs <- tm_map(docs, stemDocument) >>> dtm <- DocumentTermMatrix(docs) >>> dtm >> <<DocumentTermMatrix (documents: 4, terms: 0)>> >> Non-/sparse entries: 0/0 >> Sparsity : 100% >> Maximal term length: 0 >> Weighting : term frequency (tf) >>> >> >> >> >> >> ________________________________ >> From: Ista Zahn <istazahn at gmail.com> >> Sent: Tuesday, December 6, 2016 9:09:37 AM >> To: Patrick Casimir >> Cc: r-help at r-project.org >> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? >> >> >> Hi Patrick, >> >> How could anyone possibly answer this question with only the information >> you've provided? It's like showing me an empty cup and asking why it's >> empty. Maybe you didn't put anything in it. Maybe you did and then you dog >> drank it or your cat knocked it over or your girlfriend drank it. How >> would >> I possibly know? >> >> Bottom line, you need to show exactly what you did to produce that result, >> preferably in the form of a few lines of code that we can run to reproduce >> your problem. >> >> Finally, you may find it helpful take some time to learn how to ask >> questions the smart way. http://catb.org/~esr/faqs/smart-questions.html is >> a >> good place to learn this important skill. >> >> Best, >> Ista >> >> >> On Dec 6, 2016 7:58 AM, "Patrick Casimir" <patrcasi at nova.edu> wrote: >> >> <<DocumentTermMatrix (documents: 4, terms: 0)>> >> Non-/sparse entries: 0/0 >> Sparsity : 100% >> Maximal term length: 0 >> Weighting : term frequency (tf) >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>
Actually, the DTM works now. This is amazing. Million thanks. Why wasn't it working before? See below:> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") > docs <- Corpus(DirSource(cname)) > dtm <- DocumentTermMatrix(docs) > dtm<<DocumentTermMatrix (documents: 4, terms: 766)>> Non-/sparse entries: 920/2144 Sparsity : 70% Maximal term length: 29 Weighting : term frequency (tf) ________________________________ From: Ista Zahn <istazahn at gmail.com> Sent: Tuesday, December 6, 2016 12:20:57 PM To: Patrick Casimir Cc: r-help at r-project.org Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? Does cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") docs <- Corpus(DirSource(cname)) dtm <- DocumentTermMatrix(docs) dtm work? If so, add start adding back your tm_map until you find the thing that breaks it. Best, Ista On Tue, Dec 6, 2016 at 10:25 AM, Patrick Casimir <patrcasi at nova.edu> wrote:> > docs has 4 documents and inspect(docs) shows 4 plaintextdocument > > >> summary(docs) > Length Class Mode > case1.txt 2 PlainTextDocument list > case2.txt 2 PlainTextDocument list > case3.txt 2 PlainTextDocument list > case4.txt 2 PlainTextDocument list > >> inspect(docs) > <<VCorpus>> > Metadata: corpus specific: 0, document level (indexed): 0 > Content: documents: 4 > > [[1]] > <<PlainTextDocument>> > Metadata: 7 > Content: chars: 4564 > > [[2]] > <<PlainTextDocument>> > Metadata: 7 > Content: chars: 9312 > > [[3]] > <<PlainTextDocument>> > Metadata: 7 > Content: chars: 1388 > > [[4]] > <<PlainTextDocument>> > Metadata: 7 > Content: chars: 2366 > > > > ________________________________ > From: Ista Zahn <istazahn at gmail.com> > Sent: Tuesday, December 6, 2016 10:08:28 AM > > To: Patrick Casimir > Cc: r-help at r-project.org > Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? > > What is in docs? > > What does > > inspect(docs) > > say? > > --Ista > > > > On Tue, Dec 6, 2016 at 9:29 AM, Patrick Casimir <patrcasi at nova.edu> wrote: >> Thanks Ista. See codes below. I am not sure why the DTM is showing 0 term. >> I >> have 4 documents in the corpus. And I was able to make transformations >> >> to the documents inside the corpus. >> >> >>> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") >>> dir(cname) >> [1] "case1.txt" "case2.txt" "case3.txt" "case4.txt" >>> library(tm) >>> docs <- Corpus(DirSource(cname)) >>> install.packages("magrittr" ,dependencies=TRUE) >>> viewDocs <- function(d, n) {d %>% extract2(n) %>% as.character() %>% >>> writeLines()} >>> viewDocs(docs, 1) >>> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", >>> x)) >>> docs <- tm_map(docs, toSpace, "/|@|nn|") >>> inspect(docs[1]) >>> docs <- tm_map(docs, removePunctuation) >>> docs <- tm_map(docs, removeWords, stopwords("english")) >>> inspect(docs[1]) >>> docs <- tm_map(docs, stripWhitespace) >>> docs <- tm_map(docs, stemDocument) >>> dtm <- DocumentTermMatrix(docs) >>> dtm >> <<DocumentTermMatrix (documents: 4, terms: 0)>> >> Non-/sparse entries: 0/0 >> Sparsity : 100% >> Maximal term length: 0 >> Weighting : term frequency (tf) >>> >> >> >> >> >> ________________________________ >> From: Ista Zahn <istazahn at gmail.com> >> Sent: Tuesday, December 6, 2016 9:09:37 AM >> To: Patrick Casimir >> Cc: r-help at r-project.org >> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? >> >> >> Hi Patrick, >> >> How could anyone possibly answer this question with only the information >> you've provided? It's like showing me an empty cup and asking why it's >> empty. Maybe you didn't put anything in it. Maybe you did and then you dog >> drank it or your cat knocked it over or your girlfriend drank it. How >> would >> I possibly know? >> >> Bottom line, you need to show exactly what you did to produce that result, >> preferably in the form of a few lines of code that we can run to reproduce >> your problem. >> >> Finally, you may find it helpful take some time to learn how to ask >> questions the smart way. http://catb.org/~esr/faqs/smart-questions.html is >> a >> good place to learn this important skill. >> >> Best, >> Ista >> >> >> On Dec 6, 2016 7:58 AM, "Patrick Casimir" <patrcasi at nova.edu> wrote: >> >> <<DocumentTermMatrix (documents: 4, terms: 0)>> >> Non-/sparse entries: 0/0 >> Sparsity : 100% >> Maximal term length: 0 >> Weighting : term frequency (tf) >> >> >> >> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >>[[alternative HTML version deleted]]
On Tue, Dec 6, 2016 at 2:28 PM, Patrick Casimir <patrcasi at nova.edu> wrote:> Actually, the DTM works now. This is amazing. Million thanks. Why wasn't it > working before?Do as I suggested and start adding back your tm_map's until you find the thing that breaks it. --Ista> > See below: > > >> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") >> docs <- Corpus(DirSource(cname)) >> dtm <- DocumentTermMatrix(docs) >> dtm > <<DocumentTermMatrix (documents: 4, terms: 766)>> > Non-/sparse entries: 920/2144 > Sparsity : 70% > Maximal term length: 29 > Weighting : term frequency (tf) > > > > ________________________________ > From: Ista Zahn <istazahn at gmail.com> > Sent: Tuesday, December 6, 2016 12:20:57 PM > > To: Patrick Casimir > Cc: r-help at r-project.org > Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? > > Does > > cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") > docs <- Corpus(DirSource(cname)) > dtm <- DocumentTermMatrix(docs) > dtm > > work? > > If so, add start adding back your tm_map until you find the thing that > breaks it. > > Best, > Ista > > On Tue, Dec 6, 2016 at 10:25 AM, Patrick Casimir <patrcasi at nova.edu> wrote: >> >> docs has 4 documents and inspect(docs) shows 4 plaintextdocument >> >> >>> summary(docs) >> Length Class Mode >> case1.txt 2 PlainTextDocument list >> case2.txt 2 PlainTextDocument list >> case3.txt 2 PlainTextDocument list >> case4.txt 2 PlainTextDocument list >> >>> inspect(docs) >> <<VCorpus>> >> Metadata: corpus specific: 0, document level (indexed): 0 >> Content: documents: 4 >> >> [[1]] >> <<PlainTextDocument>> >> Metadata: 7 >> Content: chars: 4564 >> >> [[2]] >> <<PlainTextDocument>> >> Metadata: 7 >> Content: chars: 9312 >> >> [[3]] >> <<PlainTextDocument>> >> Metadata: 7 >> Content: chars: 1388 >> >> [[4]] >> <<PlainTextDocument>> >> Metadata: 7 >> Content: chars: 2366 >> >> >> >> ________________________________ >> From: Ista Zahn <istazahn at gmail.com> >> Sent: Tuesday, December 6, 2016 10:08:28 AM >> >> To: Patrick Casimir >> Cc: r-help at r-project.org >> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? >> >> What is in docs? >> >> What does >> >> inspect(docs) >> >> say? >> >> --Ista >> >> >> >> On Tue, Dec 6, 2016 at 9:29 AM, Patrick Casimir <patrcasi at nova.edu> wrote: >>> Thanks Ista. See codes below. I am not sure why the DTM is showing 0 >>> term. >>> I >>> have 4 documents in the corpus. And I was able to make transformations >>> >>> to the documents inside the corpus. >>> >>> >>>> cname <- file.path("C:\\Users\\Desktop\\Text Mining\\Cases\\MyCorpus") >>>> dir(cname) >>> [1] "case1.txt" "case2.txt" "case3.txt" "case4.txt" >>>> library(tm) >>>> docs <- Corpus(DirSource(cname)) >>>> install.packages("magrittr" ,dependencies=TRUE) >>>> viewDocs <- function(d, n) {d %>% extract2(n) %>% as.character() %>% >>>> writeLines()} >>>> viewDocs(docs, 1) >>>> toSpace <- content_transformer(function(x, pattern) gsub(pattern, " ", >>>> x)) >>>> docs <- tm_map(docs, toSpace, "/|@|nn|") >>>> inspect(docs[1]) >>>> docs <- tm_map(docs, removePunctuation) >>>> docs <- tm_map(docs, removeWords, stopwords("english")) >>>> inspect(docs[1]) >>>> docs <- tm_map(docs, stripWhitespace) >>>> docs <- tm_map(docs, stemDocument) >>>> dtm <- DocumentTermMatrix(docs) >>>> dtm >>> <<DocumentTermMatrix (documents: 4, terms: 0)>> >>> Non-/sparse entries: 0/0 >>> Sparsity : 100% >>> Maximal term length: 0 >>> Weighting : term frequency (tf) >>>> >>> >>> >>> >>> >>> ________________________________ >>> From: Ista Zahn <istazahn at gmail.com> >>> Sent: Tuesday, December 6, 2016 9:09:37 AM >>> To: Patrick Casimir >>> Cc: r-help at r-project.org >>> Subject: Re: [R] Why is DocumentTermMatrix showing 0 term? >>> >>> >>> Hi Patrick, >>> >>> How could anyone possibly answer this question with only the information >>> you've provided? It's like showing me an empty cup and asking why it's >>> empty. Maybe you didn't put anything in it. Maybe you did and then you >>> dog >>> drank it or your cat knocked it over or your girlfriend drank it. How >>> would >>> I possibly know? >>> >>> Bottom line, you need to show exactly what you did to produce that >>> result, >>> preferably in the form of a few lines of code that we can run to >>> reproduce >>> your problem. >>> >>> Finally, you may find it helpful take some time to learn how to ask >>> questions the smart way. http://catb.org/~esr/faqs/smart-questions.html >>> is >>> a >>> good place to learn this important skill. >>> >>> Best, >>> Ista >>> >>> >>> On Dec 6, 2016 7:58 AM, "Patrick Casimir" <patrcasi at nova.edu> wrote: >>> >>> <<DocumentTermMatrix (documents: 4, terms: 0)>> >>> Non-/sparse entries: 0/0 >>> Sparsity : 100% >>> Maximal term length: 0 >>> Weighting : term frequency (tf) >>> >>> >>> >>> >>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>>