Martin Møller Skarbiniks Pedersen
2020-Feb-21 12:17 UTC
[R] RegExpr: Help match quote inside a set
Hi, I am trying to understand the different functions for working with regular expression in R. However I get a strange result for one of experiments, which I need help to understand. First: I search for any of the characters .,;"- in the book emma> length(grep("[.,;\"-]",janeaustenr::emma))[1] 13110 And that is probably correct. Second: I try to add ' to the set to search for:> length(grep("[.,;\"-']",janeaustenr::emma))[1] 12816 No warning or errors but fewer hits? Why? Third: I try quoting the ' and probably now gets the correct result.> length(grep("[.,;\"-\\']",janeaustenr::emma))[1] 13433 But still what does grep("[.,;\"-']", janeaustenr::emma) exactly? Regards Martin sorry for the html. It is not possible to remove it complete in gmail.
On Fri, 21 Feb 2020 13:17:59 +0100 Martin M?ller Skarbiniks Pedersen <traxplayer at gmail.com> wrote:> "[.,;\"-']"Note that there is an - between " and ', which transforms your regular expression into a range (all characters between " and ') instead of a set. Move the - right in front of the closing bracket ] to make it match a literal - again.> sorry for the html.By the way, you managed to switch this e-mail to plain text instead of HTML. -- Best regards, Ivan
Yes. From ?regex "A range of characters may be specified by giving the first and last characters, separated by a hyphen. (Because their interpretation is locale- and implementation-dependent, character ranges are best avoided.) " Although it is terse, if you have not already done so, you should read ?regex carefully. I'm a dummy but have found it very helpful. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Feb 21, 2020 at 4:28 AM Ivan Krylov <krylov.r00t at gmail.com> wrote:> On Fri, 21 Feb 2020 13:17:59 +0100 > Martin M?ller Skarbiniks Pedersen <traxplayer at gmail.com> wrote: > > > "[.,;\"-']" > > Note that there is an - between " and ', which transforms your > regular expression into a range (all characters between " and ') > instead of a set. Move the - right in front of the closing bracket ] to > make it match a literal - again. > > > sorry for the html. > > By the way, you managed to switch this e-mail to plain text instead of > HTML. > > -- > Best regards, > Ivan > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]