I am new to R. Busy with Text Analysis. Need a script to find e.g whale, whales, whale's, whaler, whalers, whaling,... in Moby Dick Riaan
You need a stemming algorithm. See here: https://cran.r-project.org/web/views/NaturalLanguageProcessing.html Myself, I've had good experience with Rstem. B.> On Jul 31, 2017, at 4:47 PM, Riaan Van Der Walt <Riaan.VanDerWalt at nwu.ac.za> wrote: > > I am new to R. > Busy with Text Analysis. > > Need a script to find e.g > > whale, whales, whale's, whaler, whalers, whaling,... in Moby Dick > > Riaan > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
**Before** posting: 1. Search: e.g. "text processing R" 2. Check CRAN Task views: e.g. "Natural Language Processing" https://cran.r-project.org/web/views/NaturalLanguageProcessing.html 3. Use R's search facility: e.g. help.search("character") which would lead you to ?grep among others, which might suggest something like grep("whal",strsplit(yourtext, split = " ", fixed = TRUE), fixed = TRUE) ... although this is likely too simple minded for a text as large as Moby Dick. But it depends on what you want to do. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jul 31, 2017 at 1:47 PM, Riaan Van Der Walt <Riaan.VanDerWalt at nwu.ac.za> wrote:> I am new to R. > Busy with Text Analysis. > > Need a script to find e.g > > whale, whales, whale's, whaler, whalers, whaling,... in Moby Dick > > Riaan > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Please keep messages on the list so others can pitch in. _Which_ words do you want to consider identical for the purpose of frequency count? _What_ do you want to plot? B.> On Aug 3, 2017, at 4:36 PM, Riaan Van Der Walt <Riaan.VanDerWalt at nwu.ac.za> wrote: > > Hallo Boris, > I've loaded the Rstem, Snowball. > But I am clueless how to get a list eg. whal* (whale, whales, whaling, whaler, whalers, whaleman, whalemen, whale-ship, whale-boat, whale's) > in the book Moby Dick and the frequency of each of the different words. > I'am usig this script: > > whales1.v <- grep("^whal.*", moby.word.v) > whales1.v > > The total occurrence for whal* is 1699. > But I can't display it or plot it. > > I am new to R and the learning curve is steep!! > > Thx! > Riaan > > > Riaan van der Walt > Tel / Phone / Mogala : 27+72+2172429 > Email / Epos / Emeile: riaan.vanderwalt at nwu.ac.za > Url: http://www.nwu.ac.za/ > > >>> Boris Steipe <boris.steipe at utoronto.ca> 31 Jul 2017 23:37 >>> > You need a stemming algorithm. See here: > https://cran.r-project.org/web/views/NaturalLanguageProcessing.html > > Myself, I've had good experience with Rstem. > > B. > > > > > > > On Jul 31, 2017, at 4:47 PM, Riaan Van Der Walt <Riaan.VanDerWalt at nwu.ac.za> wrote: > > > > I am new to R. > > Busy with Text Analysis. > > > > Need a script to find e.g > > > > whale, whales, whale's, whaler, whalers, whaling,... in Moby Dick > > > > Riaan > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > <Riaan Van Der Walt.vcf>
Use tm package and create a corpus to capture terms from the TDM within the corpus. Then you can apply as.matrix() to display terms' occurences. Go to CRAN and read about tm package. ________________________________ From: R-help <r-help-bounces at r-project.org> on behalf of Boris Steipe <boris.steipe at utoronto.ca> Sent: Thursday, August 3, 2017 6:40:09 PM To: Riaan Van Der Walt Cc: R lists Subject: Re: [R] find similar words in text Please keep messages on the list so others can pitch in. _Which_ words do you want to consider identical for the purpose of frequency count? _What_ do you want to plot? B.> On Aug 3, 2017, at 4:36 PM, Riaan Van Der Walt <Riaan.VanDerWalt at nwu.ac.za> wrote: > > Hallo Boris, > I've loaded the Rstem, Snowball. > But I am clueless how to get a list eg. whal* (whale, whales, whaling, whaler, whalers, whaleman, whalemen, whale-ship, whale-boat, whale's) > in the book Moby Dick and the frequency of each of the different words. > I'am usig this script: > > whales1.v <- grep("^whal.*", moby.word.v) > whales1.v > > The total occurrence for whal* is 1699. > But I can't display it or plot it. > > I am new to R and the learning curve is steep!! > > Thx! > Riaan > > > Riaan van der Walt > Tel / Phone / Mogala : 27+72+2172429 > Email / Epos / Emeile: riaan.vanderwalt at nwu.ac.za > Url: http://www.nwu.ac.za/ > > >>> Boris Steipe <boris.steipe at utoronto.ca> 31 Jul 2017 23:37 >>> > You need a stemming algorithm. See here: > https://cran.r-project.org/web/views/NaturalLanguageProcessing.html > > Myself, I've had good experience with Rstem. > > B. > > > > > > > On Jul 31, 2017, at 4:47 PM, Riaan Van Der Walt <Riaan.VanDerWalt at nwu.ac.za> wrote: > > > > I am new to R. > > Busy with Text Analysis. > > > > Need a script to find e.g > > > > whale, whales, whale's, whaler, whalers, whaling,... in Moby Dick > > > > Riaan > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > <Riaan Van Der Walt.vcf>______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]