Dimitri Liakhovitski
2015-Apr-20 13:59 UTC
[R] regexpr - ignore all special characters and punctuation in a string
Hello! Please point me in the right direction. I need to match 2 strings, but focusing ONLY on characters, ignoring all special characters and punctuation signs, including (), "", etc.. For example: I want the following to return: TRUE "What a nice day today! - Story of happiness: Part 2." = "What a nice day today: Story of happiness (Part 2)" -- Thank you! Dimitri Liakhovitski
Dimitri Liakhovitski
2015-Apr-20 14:05 UTC
[R] regexpr - ignore all special characters and punctuation in a string
I think I found a partial answer: str_replace_all(x, "[[:punct:]]", " ") On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hello! > > Please point me in the right direction. > I need to match 2 strings, but focusing ONLY on characters, ignoring > all special characters and punctuation signs, including (), "", etc.. > > For example: > I want the following to return: TRUE > > "What a nice day today! - Story of happiness: Part 2." => "What a nice day today: Story of happiness (Part 2)" > > > -- > Thank you! > Dimitri Liakhovitski-- Dimitri Liakhovitski
Marc Schwartz
2015-Apr-20 14:08 UTC
[R] regexpr - ignore all special characters and punctuation in a string
> On Apr 20, 2015, at 8:59 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote: > > Hello! > > Please point me in the right direction. > I need to match 2 strings, but focusing ONLY on characters, ignoring > all special characters and punctuation signs, including (), "", etc.. > > For example: > I want the following to return: TRUE > > "What a nice day today! - Story of happiness: Part 2." => "What a nice day today: Story of happiness (Part 2)" > > > -- > Thank you! > Dimitri LiakhovitskiLook at ?agrep: Vec1 <- "What a nice day today! - Story of happiness: Part 2." Vec2 <- "What a nice day today: Story of happiness (Part 2)? # Match the words, not the punctuation. # Not fully tested> agrep("What a nice day today Story of happiness Part 2", c(Vec1, Vec2))[1] 1 2> agrep("What a nice day today Story of happiness Part 2", c(Vec1, Vec2),value = TRUE) [1] "What a nice day today! - Story of happiness: Part 2." [2] "What a nice day today: Story of happiness (Part 2)? Also, possibly: http://cran.r-project.org/web/packages/stringdist Regards, Marc Schwartz
Sven E. Templer
2015-Apr-20 14:10 UTC
[R] regexpr - ignore all special characters and punctuation in a string
Hi Dimitri,
str_replace_all is not in the base libraries, you could use 'gsub' as
well,
for example:
a = "What a nice day today! - Story of happiness: Part 2."
b = "What a nice day today: Story of happiness (Part 2)"
sa = gsub("[^A-Za-z0-9]", "", a)
sb = gsub("[^A-Za-z0-9]", "", b)
a==b
# [1] FALSE
sa==sb
# [1] TRUE
Take care of the extra space in a after the '-', so also replace
spaces...
Best,
Sven.
On 20 April 2015 at 16:05, Dimitri Liakhovitski <
dimitri.liakhovitski at gmail.com> wrote:
> I think I found a partial answer:
>
> str_replace_all(x, "[[:punct:]]", " ")
>
> On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
> > Hello!
> >
> > Please point me in the right direction.
> > I need to match 2 strings, but focusing ONLY on characters, ignoring
> > all special characters and punctuation signs, including (),
"", etc..
> >
> > For example:
> > I want the following to return: TRUE
> >
> > "What a nice day today! - Story of happiness: Part 2." =>
> "What a nice day today: Story of happiness (Part 2)"
> >
> >
> > --
> > Thank you!
> > Dimitri Liakhovitski
>
>
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
Duncan Murdoch
2015-Apr-20 14:10 UTC
[R] regexpr - ignore all special characters and punctuation in a string
On 20/04/2015 9:59 AM, Dimitri Liakhovitski wrote:> Hello! > > Please point me in the right direction. > I need to match 2 strings, but focusing ONLY on characters, ignoring > all special characters and punctuation signs, including (), "", etc.. > > For example: > I want the following to return: TRUE > > "What a nice day today! - Story of happiness: Part 2." => "What a nice day today: Story of happiness (Part 2)" > >I would transform both strings using gsub(), then compare. e.g. clean <- function(s) gsub("[[:punct:][:blank:]]", "", s) clean("What a nice day today! - Story of happiness: Part 2.") =clean("What a nice day today: Story of happiness (Part 2)") This completely ignores spaces; you might want something more sophisticated if you consider "today" and "to day" to be different, e.g. clean <- function(s) { s <- gsub("[[:punct:]]", "", s) gsub("[[:blank:]]+", " ", s) } which converts multiple blanks into single spaces. Duncan Murdoch
John McKown
2015-Apr-20 14:11 UTC
[R] regexpr - ignore all special characters and punctuation in a string
On Mon, Apr 20, 2015 at 8:59 AM, Dimitri Liakhovitski < dimitri.liakhovitski at gmail.com> wrote:> Hello! > > Please point me in the right direction. > I need to match 2 strings, but focusing ONLY on characters, ignoring > all special characters and punctuation signs, including (), "", etc.. > > For example: > I want the following to return: TRUE > > "What a nice day today! - Story of happiness: Part 2." => "What a nice day today: Story of happiness (Part 2)" > > > -- > Thank you! > Dimitri Liakhovitski > > >?Perhaps a variation on:> str1<-"What a nice day today! - Story of happiness: Part 2." > str2<- "What a nice day today: Story of happiness (Part 2)" > gsub('[^[:alpha:]]','',str1)==gsub('[^[:alpha:]]','',str2)[1] TRUE>The gsub() removes all characters which are not alphabetic from each string and then compares them.? -- If you sent twitter messages while exploring, are you on a textpedition? He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown [[alternative HTML version deleted]]