Hi Boris, In that case, if I have lot of free text data (let us assume part of an Election speech) in one single TEXT document, and i want to find the association of the top 3 most frequently occurring words with the other words in the speech, what method do I adopt ? On Wed, Nov 15, 2017 at 7:08 PM, Boris Steipe <boris.steipe at utoronto.ca> wrote:> If you consider the definition of a DTM, and that findAssoc() computes > associations between words as correlations across documents(!), you will > realize that you can't what you want from a single document. Indeed, what > kind of an "association" would you even be looking for? > > B. > > > > > On Nov 15, 2017, at 12:40 AM, Rahul singh <rahulutube69 at gmail.com> > wrote: > > > > I have free text data in a single text document. I create a corpus, and > > then a document term matrix out of it. I can create a word cloud too. > > > > But when I do word association for the same, using "findAssocs(), it > always > > returns numeric(0). > > > > EX : findAssocs(dtm, "king" ,000000000000000000000.1) > > > > I read on stack overflow that it is because I have a single document. > > > > What is the workaround for the same ? > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > >[[alternative HTML version deleted]]
In general, statistical methodology queries, which seems to be your concern, are offtopic here.This list is about R programming. Consider stats.stackexchange.com for statistical queries. However, the CRAN task view on natural language processing might be useful, so you may wish to check it: https://cran.r-project.org/web/views/NaturalLanguageProcessing.html Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Nov 15, 2017 at 6:17 PM, Rahul singh <rahulutube69 at gmail.com> wrote:> Hi Boris, > > In that case, if I have lot of free text data (let us assume part of an > Election speech) in one single TEXT document, and i want to find the > association of the top 3 most frequently occurring words with the other > words in the speech, what method do I adopt ? > > On Wed, Nov 15, 2017 at 7:08 PM, Boris Steipe <boris.steipe at utoronto.ca> > wrote: > > > If you consider the definition of a DTM, and that findAssoc() computes > > associations between words as correlations across documents(!), you will > > realize that you can't what you want from a single document. Indeed, what > > kind of an "association" would you even be looking for? > > > > B. > > > > > > > > > On Nov 15, 2017, at 12:40 AM, Rahul singh <rahulutube69 at gmail.com> > > wrote: > > > > > > I have free text data in a single text document. I create a corpus, and > > > then a document term matrix out of it. I can create a word cloud too. > > > > > > But when I do word association for the same, using "findAssocs(), it > > always > > > returns numeric(0). > > > > > > EX : findAssocs(dtm, "king" ,000000000000000000000.1) > > > > > > I read on stack overflow that it is because I have a single document. > > > > > > What is the workaround for the same ? > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
No - the CRAN task view is not going to help you at all, since you need to think more about the question that you are trying to ask before you can start worrying about which packages to pursue it with. In your case this hinges on the question what you mean by "association". In the same phrase? In the same sentence? Adjacent? Or separated by k words? For what k? Once you come clear on that, we can probably show you ways to translate your procedure into R code. But - as Bert mentioned - we are not well positioned to define the procedure for you. Boris> On Nov 15, 2017, at 10:35 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: > > In general, statistical methodology queries, which seems to be your concern, are offtopic here.This list is about R programming. Consider stats.stackexchange.com for statistical queries. > > However, the CRAN task view on natural language processing might be useful, so you may wish to check it: > > https://cran.r-project.org/web/views/NaturalLanguageProcessing.html > > Cheers, > Bert > > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Wed, Nov 15, 2017 at 6:17 PM, Rahul singh <rahulutube69 at gmail.com> wrote: > Hi Boris, > > In that case, if I have lot of free text data (let us assume part of an > Election speech) in one single TEXT document, and i want to find the > association of the top 3 most frequently occurring words with the other > words in the speech, what method do I adopt ? > > On Wed, Nov 15, 2017 at 7:08 PM, Boris Steipe <boris.steipe at utoronto.ca> > wrote: > > > If you consider the definition of a DTM, and that findAssoc() computes > > associations between words as correlations across documents(!), you will > > realize that you can't what you want from a single document. Indeed, what > > kind of an "association" would you even be looking for? > > > > B. > > > > > > > > > On Nov 15, 2017, at 12:40 AM, Rahul singh <rahulutube69 at gmail.com> > > wrote: > > > > > > I have free text data in a single text document. I create a corpus, and > > > then a document term matrix out of it. I can create a word cloud too. > > > > > > But when I do word association for the same, using "findAssocs(), it > > always > > > returns numeric(0). > > > > > > EX : findAssocs(dtm, "king" ,000000000000000000000.1) > > > > > > I read on stack overflow that it is because I have a single document. > > > > > > What is the workaround for the same ? > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >