Sven E. Templer
2015-Apr-20 14:10 UTC
[R] regexpr - ignore all special characters and punctuation in a string
Hi Dimitri,
str_replace_all is not in the base libraries, you could use 'gsub' as
well,
for example:
a = "What a nice day today! - Story of happiness: Part 2."
b = "What a nice day today: Story of happiness (Part 2)"
sa = gsub("[^A-Za-z0-9]", "", a)
sb = gsub("[^A-Za-z0-9]", "", b)
a==b
# [1] FALSE
sa==sb
# [1] TRUE
Take care of the extra space in a after the '-', so also replace
spaces...
Best,
Sven.
On 20 April 2015 at 16:05, Dimitri Liakhovitski <
dimitri.liakhovitski at gmail.com> wrote:
> I think I found a partial answer:
>
> str_replace_all(x, "[[:punct:]]", " ")
>
> On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
> > Hello!
> >
> > Please point me in the right direction.
> > I need to match 2 strings, but focusing ONLY on characters, ignoring
> > all special characters and punctuation signs, including (),
"", etc..
> >
> > For example:
> > I want the following to return: TRUE
> >
> > "What a nice day today! - Story of happiness: Part 2." =>
> "What a nice day today: Story of happiness (Part 2)"
> >
> >
> > --
> > Thank you!
> > Dimitri Liakhovitski
>
>
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Charles Determan
2015-Apr-20 14:15 UTC
[R] regexpr - ignore all special characters and punctuation in a string
You can use the [:alnum:] regex class with gsub.
str1 <- "What a nice day today! - Story of happiness: Part 2."
str2 <- "What a nice day today: Story of happiness (Part 2)"
gsub("[^[:alnum:]]", "", str1) ==
gsub("[^[:alnum:]]", "", str2)
[1] TRUE
The same can be done with the stringr package if you really are partial to
it.
library(stringr)
On Mon, Apr 20, 2015 at 9:10 AM, Sven E. Templer <sven.templer at
gmail.com>
wrote:
> Hi Dimitri,
>
> str_replace_all is not in the base libraries, you could use 'gsub'
as well,
> for example:
>
> a = "What a nice day today! - Story of happiness: Part 2."
> b = "What a nice day today: Story of happiness (Part 2)"
> sa = gsub("[^A-Za-z0-9]", "", a)
> sb = gsub("[^A-Za-z0-9]", "", b)
> a==b
> # [1] FALSE
> sa==sb
> # [1] TRUE
>
> Take care of the extra space in a after the '-', so also replace
spaces...
>
> Best,
> Sven.
>
> On 20 April 2015 at 16:05, Dimitri Liakhovitski <
> dimitri.liakhovitski at gmail.com> wrote:
>
> > I think I found a partial answer:
> >
> > str_replace_all(x, "[[:punct:]]", " ")
> >
> > On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski
> > <dimitri.liakhovitski at gmail.com> wrote:
> > > Hello!
> > >
> > > Please point me in the right direction.
> > > I need to match 2 strings, but focusing ONLY on characters,
ignoring
> > > all special characters and punctuation signs, including (),
"", etc..
> > >
> > > For example:
> > > I want the following to return: TRUE
> > >
> > > "What a nice day today! - Story of happiness: Part 2."
=> > > "What a nice day today: Story of happiness (Part
2)"
> > >
> > >
> > > --
> > > Thank you!
> > > Dimitri Liakhovitski
> >
> >
> >
> > --
> > Dimitri Liakhovitski
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Dimitri Liakhovitski
2015-Apr-20 15:20 UTC
[R] regexpr - ignore all special characters and punctuation in a string
Thanks a lot, everybody for excellent suggestions! On Mon, Apr 20, 2015 at 10:15 AM, Charles Determan <cdetermanjr at gmail.com> wrote:> You can use the [:alnum:] regex class with gsub. > > str1 <- "What a nice day today! - Story of happiness: Part 2." > str2 <- "What a nice day today: Story of happiness (Part 2)" > > gsub("[^[:alnum:]]", "", str1) == gsub("[^[:alnum:]]", "", str2) > [1] TRUE > > The same can be done with the stringr package if you really are partial to > it. > > library(stringr) > > > > > > On Mon, Apr 20, 2015 at 9:10 AM, Sven E. Templer <sven.templer at gmail.com> > wrote: >> >> Hi Dimitri, >> >> str_replace_all is not in the base libraries, you could use 'gsub' as >> well, >> for example: >> >> a = "What a nice day today! - Story of happiness: Part 2." >> b = "What a nice day today: Story of happiness (Part 2)" >> sa = gsub("[^A-Za-z0-9]", "", a) >> sb = gsub("[^A-Za-z0-9]", "", b) >> a==b >> # [1] FALSE >> sa==sb >> # [1] TRUE >> >> Take care of the extra space in a after the '-', so also replace spaces... >> >> Best, >> Sven. >> >> On 20 April 2015 at 16:05, Dimitri Liakhovitski < >> dimitri.liakhovitski at gmail.com> wrote: >> >> > I think I found a partial answer: >> > >> > str_replace_all(x, "[[:punct:]]", " ") >> > >> > On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski >> > <dimitri.liakhovitski at gmail.com> wrote: >> > > Hello! >> > > >> > > Please point me in the right direction. >> > > I need to match 2 strings, but focusing ONLY on characters, ignoring >> > > all special characters and punctuation signs, including (), "", etc.. >> > > >> > > For example: >> > > I want the following to return: TRUE >> > > >> > > "What a nice day today! - Story of happiness: Part 2." =>> > > "What a nice day today: Story of happiness (Part 2)" >> > > >> > > >> > > -- >> > > Thank you! >> > > Dimitri Liakhovitski >> > >> > >> > >> > -- >> > Dimitri Liakhovitski >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- Dimitri Liakhovitski