Dimitri Liakhovitski
2015-Apr-20 13:59 UTC
[R] regexpr - ignore all special characters and punctuation in a string
Hello! Please point me in the right direction. I need to match 2 strings, but focusing ONLY on characters, ignoring all special characters and punctuation signs, including (), "", etc.. For example: I want the following to return: TRUE "What a nice day today! - Story of happiness: Part 2." = "What a nice day today: Story of happiness (Part 2)" -- Thank you! Dimitri Liakhovitski
Dimitri Liakhovitski
2015-Apr-20 14:05 UTC
[R] regexpr - ignore all special characters and punctuation in a string
I think I found a partial answer: str_replace_all(x, "[[:punct:]]", " ") On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hello! > > Please point me in the right direction. > I need to match 2 strings, but focusing ONLY on characters, ignoring > all special characters and punctuation signs, including (), "", etc.. > > For example: > I want the following to return: TRUE > > "What a nice day today! - Story of happiness: Part 2." => "What a nice day today: Story of happiness (Part 2)" > > > -- > Thank you! > Dimitri Liakhovitski-- Dimitri Liakhovitski
Marc Schwartz
2015-Apr-20 14:08 UTC
[R] regexpr - ignore all special characters and punctuation in a string
> On Apr 20, 2015, at 8:59 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote: > > Hello! > > Please point me in the right direction. > I need to match 2 strings, but focusing ONLY on characters, ignoring > all special characters and punctuation signs, including (), "", etc.. > > For example: > I want the following to return: TRUE > > "What a nice day today! - Story of happiness: Part 2." => "What a nice day today: Story of happiness (Part 2)" > > > -- > Thank you! > Dimitri LiakhovitskiLook at ?agrep: Vec1 <- "What a nice day today! - Story of happiness: Part 2." Vec2 <- "What a nice day today: Story of happiness (Part 2)? # Match the words, not the punctuation. # Not fully tested> agrep("What a nice day today Story of happiness Part 2", c(Vec1, Vec2))[1] 1 2> agrep("What a nice day today Story of happiness Part 2", c(Vec1, Vec2),value = TRUE) [1] "What a nice day today! - Story of happiness: Part 2." [2] "What a nice day today: Story of happiness (Part 2)? Also, possibly: http://cran.r-project.org/web/packages/stringdist Regards, Marc Schwartz
Sven E. Templer
2015-Apr-20 14:10 UTC
[R] regexpr - ignore all special characters and punctuation in a string
Hi Dimitri, str_replace_all is not in the base libraries, you could use 'gsub' as well, for example: a = "What a nice day today! - Story of happiness: Part 2." b = "What a nice day today: Story of happiness (Part 2)" sa = gsub("[^A-Za-z0-9]", "", a) sb = gsub("[^A-Za-z0-9]", "", b) a==b # [1] FALSE sa==sb # [1] TRUE Take care of the extra space in a after the '-', so also replace spaces... Best, Sven. On 20 April 2015 at 16:05, Dimitri Liakhovitski < dimitri.liakhovitski at gmail.com> wrote:> I think I found a partial answer: > > str_replace_all(x, "[[:punct:]]", " ") > > On Mon, Apr 20, 2015 at 9:59 AM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: > > Hello! > > > > Please point me in the right direction. > > I need to match 2 strings, but focusing ONLY on characters, ignoring > > all special characters and punctuation signs, including (), "", etc.. > > > > For example: > > I want the following to return: TRUE > > > > "What a nice day today! - Story of happiness: Part 2." => > "What a nice day today: Story of happiness (Part 2)" > > > > > > -- > > Thank you! > > Dimitri Liakhovitski > > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Duncan Murdoch
2015-Apr-20 14:10 UTC
[R] regexpr - ignore all special characters and punctuation in a string
On 20/04/2015 9:59 AM, Dimitri Liakhovitski wrote:> Hello! > > Please point me in the right direction. > I need to match 2 strings, but focusing ONLY on characters, ignoring > all special characters and punctuation signs, including (), "", etc.. > > For example: > I want the following to return: TRUE > > "What a nice day today! - Story of happiness: Part 2." => "What a nice day today: Story of happiness (Part 2)" > >I would transform both strings using gsub(), then compare. e.g. clean <- function(s) gsub("[[:punct:][:blank:]]", "", s) clean("What a nice day today! - Story of happiness: Part 2.") =clean("What a nice day today: Story of happiness (Part 2)") This completely ignores spaces; you might want something more sophisticated if you consider "today" and "to day" to be different, e.g. clean <- function(s) { s <- gsub("[[:punct:]]", "", s) gsub("[[:blank:]]+", " ", s) } which converts multiple blanks into single spaces. Duncan Murdoch
John McKown
2015-Apr-20 14:11 UTC
[R] regexpr - ignore all special characters and punctuation in a string
On Mon, Apr 20, 2015 at 8:59 AM, Dimitri Liakhovitski < dimitri.liakhovitski at gmail.com> wrote:> Hello! > > Please point me in the right direction. > I need to match 2 strings, but focusing ONLY on characters, ignoring > all special characters and punctuation signs, including (), "", etc.. > > For example: > I want the following to return: TRUE > > "What a nice day today! - Story of happiness: Part 2." => "What a nice day today: Story of happiness (Part 2)" > > > -- > Thank you! > Dimitri Liakhovitski > > >?Perhaps a variation on:> str1<-"What a nice day today! - Story of happiness: Part 2." > str2<- "What a nice day today: Story of happiness (Part 2)" > gsub('[^[:alpha:]]','',str1)==gsub('[^[:alpha:]]','',str2)[1] TRUE>The gsub() removes all characters which are not alphabetic from each string and then compares them.? -- If you sent twitter messages while exploring, are you on a textpedition? He's about as useful as a wax frying pan. 10 to the 12th power microphones = 1 Megaphone Maranatha! <>< John McKown [[alternative HTML version deleted]]