Sven E. Templer
2015-Apr-20 14:10 UTC
[R] regexpr - ignore all special characters and punctuation in a string
Hi Dimitri, str_replace_all is not in the base libraries, you could use 'gsub' as well, for example: a = "What a nice day today! - Story of happiness: Part 2." b = "What a nice day today: Story of happiness (Part 2)" sa = gsub("[^A-Za-z0-9]", "", a) sb = gsub("[^A-Za-z0-9]", "", b) a==b # [1] FALSE sa==sb # [1] TRUE Take care of the extra space in a after the '-', so also replace spaces... Best, Sven. On 20 April 2015 at 16:05, Dimitri Liakhovitski < dimitri.liakhovitski at gmail.com> wrote:> I think I found a partial answer: > > str_replace_all(x, "[[:punct:]]", " ") > > On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: > > Hello! > > > > Please point me in the right direction. > > I need to match 2 strings, but focusing ONLY on characters, ignoring > > all special characters and punctuation signs, including (), "", etc.. > > > > For example: > > I want the following to return: TRUE > > > > "What a nice day today! - Story of happiness: Part 2." => > "What a nice day today: Story of happiness (Part 2)" > > > > > > -- > > Thank you! > > Dimitri Liakhovitski > > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Charles Determan
2015-Apr-20 14:15 UTC
[R] regexpr - ignore all special characters and punctuation in a string
You can use the [:alnum:] regex class with gsub. str1 <- "What a nice day today! - Story of happiness: Part 2." str2 <- "What a nice day today: Story of happiness (Part 2)" gsub("[^[:alnum:]]", "", str1) == gsub("[^[:alnum:]]", "", str2) [1] TRUE The same can be done with the stringr package if you really are partial to it. library(stringr) On Mon, Apr 20, 2015 at 9:10 AM, Sven E. Templer <sven.templer at gmail.com> wrote:> Hi Dimitri, > > str_replace_all is not in the base libraries, you could use 'gsub' as well, > for example: > > a = "What a nice day today! - Story of happiness: Part 2." > b = "What a nice day today: Story of happiness (Part 2)" > sa = gsub("[^A-Za-z0-9]", "", a) > sb = gsub("[^A-Za-z0-9]", "", b) > a==b > # [1] FALSE > sa==sb > # [1] TRUE > > Take care of the extra space in a after the '-', so also replace spaces... > > Best, > Sven. > > On 20 April 2015 at 16:05, Dimitri Liakhovitski < > dimitri.liakhovitski at gmail.com> wrote: > > > I think I found a partial answer: > > > > str_replace_all(x, "[[:punct:]]", " ") > > > > On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski > > <dimitri.liakhovitski at gmail.com> wrote: > > > Hello! > > > > > > Please point me in the right direction. > > > I need to match 2 strings, but focusing ONLY on characters, ignoring > > > all special characters and punctuation signs, including (), "", etc.. > > > > > > For example: > > > I want the following to return: TRUE > > > > > > "What a nice day today! - Story of happiness: Part 2." => > > "What a nice day today: Story of happiness (Part 2)" > > > > > > > > > -- > > > Thank you! > > > Dimitri Liakhovitski > > > > > > > > -- > > Dimitri Liakhovitski > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dimitri Liakhovitski
2015-Apr-20 15:20 UTC
[R] regexpr - ignore all special characters and punctuation in a string
Thanks a lot, everybody for excellent suggestions! On Mon, Apr 20, 2015 at 10:15 AM, Charles Determan <cdetermanjr at gmail.com> wrote:> You can use the [:alnum:] regex class with gsub. > > str1 <- "What a nice day today! - Story of happiness: Part 2." > str2 <- "What a nice day today: Story of happiness (Part 2)" > > gsub("[^[:alnum:]]", "", str1) == gsub("[^[:alnum:]]", "", str2) > [1] TRUE > > The same can be done with the stringr package if you really are partial to > it. > > library(stringr) > > > > > > On Mon, Apr 20, 2015 at 9:10 AM, Sven E. Templer <sven.templer at gmail.com> > wrote: >> >> Hi Dimitri, >> >> str_replace_all is not in the base libraries, you could use 'gsub' as >> well, >> for example: >> >> a = "What a nice day today! - Story of happiness: Part 2." >> b = "What a nice day today: Story of happiness (Part 2)" >> sa = gsub("[^A-Za-z0-9]", "", a) >> sb = gsub("[^A-Za-z0-9]", "", b) >> a==b >> # [1] FALSE >> sa==sb >> # [1] TRUE >> >> Take care of the extra space in a after the '-', so also replace spaces... >> >> Best, >> Sven. >> >> On 20 April 2015 at 16:05, Dimitri Liakhovitski < >> dimitri.liakhovitski at gmail.com> wrote: >> >> > I think I found a partial answer: >> > >> > str_replace_all(x, "[[:punct:]]", " ") >> > >> > On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski >> > <dimitri.liakhovitski at gmail.com> wrote: >> > > Hello! >> > > >> > > Please point me in the right direction. >> > > I need to match 2 strings, but focusing ONLY on characters, ignoring >> > > all special characters and punctuation signs, including (), "", etc.. >> > > >> > > For example: >> > > I want the following to return: TRUE >> > > >> > > "What a nice day today! - Story of happiness: Part 2." =>> > > "What a nice day today: Story of happiness (Part 2)" >> > > >> > > >> > > -- >> > > Thank you! >> > > Dimitri Liakhovitski >> > >> > >> > >> > -- >> > Dimitri Liakhovitski >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- Dimitri Liakhovitski