You can define stop words as below. data <- tm_map(data, removeWords, stopwords("english")) Patrick Casimir, PhD Health Analytics, Data Science, Big Data Expert & Independent Consultant C: 954.614.1178 ________________________________ From: R-help <r-help-bounces at r-project.org> on behalf of Bert Gunter <bgunter.4567 at gmail.com> Sent: Monday, June 12, 2017 10:12:33 AM To: Elahe chalabi Cc: R-help Mailing List Subject: Re: [R] count number of stop words in R You can use regular expressions. ?regex and/or the stringr package are good places to start. Of course, you have to define "stop words." Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help <r-help at r-project.org> wrote:> Hi all, > > Is there a way in R to count the number of stop words (English) of a string using tm package? > > str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing . > > 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter . She's outside ." > > Thanks for any help! > Elahe > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing . On Monday, June 12, 2017 7:24 AM, Patrick Casimir <patrcasi at nova.edu> wrote: You can define stop words as below. data <- tm_map(data, removeWords, stopwords("english")) Patrick Casimir, PhD Health Analytics, Data Science, Big Data Expert & Independent Consultant C: 954.614.1178 ________________________________ From: R-help <r-help-bounces at r-project.org> on behalf of Bert Gunter <bgunter.4567 at gmail.com> Sent: Monday, June 12, 2017 10:12:33 AM To: Elahe chalabi Cc: R-help Mailing List Subject: Re: [R] count number of stop words in R You can use regular expressions. ?regex and/or the stringr package are good places to start. Of course, you have to define "stop words." Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help <r-help at r-project.org> wrote:> Hi all, > > Is there a way in R to count the number of stop words (English) of a string using tm package? > > str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in thesink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .> > 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter. She's outside .">[[elided Yahoo spam]]> Elahe > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
define your string as whatever object you want: data <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing." Patrick Casimir, PhD Health Analytics, Data Science, Big Data Expert & Independent Consultant C: 954.614.1178 ________________________________ From: Elahe chalabi <chalabi.elahe at yahoo.de> Sent: Monday, June 12, 2017 11:23:42 AM To: Patrick Casimir; Bert Gunter Cc: R-help Mailing List Subject: Re: [R] count number of stop words in R Thanks for your reply. I know the command data <- tm_map(data, removeWords, stopwords("english")) removes English stop words, I don't know how should I count stop words of my string: str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing . On Monday, June 12, 2017 7:24 AM, Patrick Casimir <patrcasi at nova.edu> wrote: You can define stop words as below. data <- tm_map(data, removeWords, stopwords("english")) Patrick Casimir, PhD Health Analytics, Data Science, Big Data Expert & Independent Consultant C: 954.614.1178 ________________________________ From: R-help <r-help-bounces at r-project.org> on behalf of Bert Gunter <bgunter.4567 at gmail.com> Sent: Monday, June 12, 2017 10:12:33 AM To: Elahe chalabi Cc: R-help Mailing List Subject: Re: [R] count number of stop words in R You can use regular expressions. ?regex and/or the stringr package are good places to start. Of course, you have to define "stop words." Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help <r-help at r-project.org> wrote:> Hi all, > > Is there a way in R to count the number of stop words (English) of a string using tm package? > > str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in thesink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing .> > 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter. She's outside .">[[elided Yahoo spam]]> Elahe > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]]
I am unfamiliar with the tm package, but using basic regex tools, is this what you want: test <- "Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing ." out <- strsplit(test, " ") ## creates a list whose only component is a vector of the words stopw <- c("a","the") ## or whatever they are sum(grepl(paste(stopw,collapse="|"), out[[1]])) ## If you want to include ".", a regex special character, add: sum(grepl(".",out[[1]],fixed=TRUE)) If this is all nonsense, just ignore -- and sorry I couldn't help. -- Bert Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 12, 2017 at 8:23 AM, Elahe chalabi <chalabi.elahe at yahoo.de> wrote:> Thanks for your reply. I know the command > data <- tm_map(data, removeWords, stopwords("english")) > removes English stop words, I don't know how should I count stop words of my string: > > > str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing . > > > > > > On Monday, June 12, 2017 7:24 AM, Patrick Casimir <patrcasi at nova.edu> wrote: > > > > You can define stop words as below. > data <- tm_map(data, removeWords, stopwords("english")) > > > Patrick Casimir, PhD > Health Analytics, Data Science, Big Data Expert & Independent Consultant > C: 954.614.1178 > > ________________________________ > > From: R-help <r-help-bounces at r-project.org> on behalf of Bert Gunter <bgunter.4567 at gmail.com> > Sent: Monday, June 12, 2017 10:12:33 AM > To: Elahe chalabi > Cc: R-help Mailing List > Subject: Re: [R] count number of stop words in R > > You can use regular expressions. > > ?regex and/or the stringr package are good places to start. Of > course, you have to define "stop words." > > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Mon, Jun 12, 2017 at 5:40 AM, Elahe chalabi via R-help > <r-help at r-project.org> wrote: >> Hi all, >> >> Is there a way in R to count the number of stop words (English) of a string using tm package? >> >> str="Mhm . Alright . There's um a young boy that's getting a cookie jar . And it he's uh in bad shape because uh the thing is falling over . And in the picture the mother is washing dishes and doesn't see it . And so is the the water is overflowing in the > sink . And the dishes might get falled over if you don't fell fall over there there if you don't get it . And it there it's a picture of a kitchen window . And the curtains are very uh distinct . But the water is still flowing . >> >> 255 Levels: A boy's on the uh falling off the stool picking up cookies . The girl's reaching up for it . The girl the lady is is drying dishes . The water is uh running over uh from the sink into the floor . The window's opened . Dishes on the on the counter > . She's outside ." >> >> Thanks for any help! >> Elahe >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.