On 23/08/17 18:33, Stefan Evert wrote:> >> On 23 Aug 2017, at 07:45, Rolf Turner <r.turner at auckland.ac.nz> wrote: >> >> My reading of ?regex led me to believe that >> >> gsub("[:alpha:]","",x) >> >> should give the result that I want. > > That's looking for any of the characters a, l, p, h, : .OK. I see that now. I don't think that it's really stated anywhere that to search for (and possibly change) any one of a string of characters you enclose that string of characters in brackets [ ]. The first example from ?grep makes this "clear" (for some value of the word "clear") once you understand what this example is on about. So it's "obvious" once you've been shown, and totally opaque until then.> What you meant to say was > > gsub("[[:alpha:]]","",x) > > i.e. the character class [:alpha:] within a character set.Yup. Got it. Thanks very much. cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
Inline. -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, Aug 23, 2017 at 2:29 AM, Rolf Turner <r.turner at auckland.ac.nz> wrote:> > On 23/08/17 18:33, Stefan Evert wrote: > >> >>> On 23 Aug 2017, at 07:45, Rolf Turner <r.turner at auckland.ac.nz> wrote: >>> >>> My reading of ?regex led me to believe that >>> >>> gsub("[:alpha:]","",x) >>> >>> should give the result that I want. >> >> >> That's looking for any of the characters a, l, p, h, : . > > > OK. I see that now. I don't think that it's really stated anywhere that to > search for (and possibly change) any one of a string of characters you > enclose that string of characters in brackets [ ]. > > The first example from ?grep makes this "clear" (for some value of the word > "clear") once you understand what this example is on about. > > So it's "obvious" once you've been shown, and totally opaque until then.Well, "obviousness" is in the mind of the beholder, but, from ?regexp: "A character class is a list of characters enclosed between [ and ] which matches any single character in that list; "... (at the end of the above section) "For example, [[:alnum:]] means [0-9A-Za-z] "... Note the doubled brackets. So seems pretty explicit to me. Cheers, Bert> >> What you meant to say was >> >> gsub("[[:alpha:]]","",x) >> >> i.e. the character class [:alpha:] within a character set. > > > Yup. Got it. Thanks very much. > > cheers, > > Rolf > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 24/08/17 02:46, Bert Gunter wrote:> Inline. > > -- Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > On Wed, Aug 23, 2017 at 2:29 AM, Rolf Turner <r.turner at auckland.ac.nz> wrote: >> >> On 23/08/17 18:33, Stefan Evert wrote: >> >>> >>>> On 23 Aug 2017, at 07:45, Rolf Turner <r.turner at auckland.ac.nz> wrote: >>>> >>>> My reading of ?regex led me to believe that >>>> >>>> gsub("[:alpha:]","",x) >>>> >>>> should give the result that I want. >>> >>> >>> That's looking for any of the characters a, l, p, h, : . >> >> >> OK. I see that now. I don't think that it's really stated anywhere that to >> search for (and possibly change) any one of a string of characters you >> enclose that string of characters in brackets [ ]. >> >> The first example from ?grep makes this "clear" (for some value of the word >> "clear") once you understand what this example is on about. >> >> So it's "obvious" once you've been shown, and totally opaque until then. > > Well, "obviousness" is in the mind of the beholder, but, from ?regexp: > > "A character class is a list of characters enclosed between [ and ] > which matches any single character in that list; "... (at the end of > the above section) > > "For example, [[:alnum:]] means [0-9A-Za-z] "... > > Note the doubled brackets. So seems pretty explicit to me.Well, yes. Once it's pointed out it's "obvious". But it's buried pretty deeply in a large mass of text, and I didn't see it until you pointed it out. If *I* had written the help file, it would be much more perspicuous. cheers, Rolf -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276
> On Aug 23, 2017, at 2:29 AM, Rolf Turner <r.turner at auckland.ac.nz> wrote: > > > On 23/08/17 18:33, Stefan Evert wrote: > >>> On 23 Aug 2017, at 07:45, Rolf Turner <r.turner at auckland.ac.nz> wrote: >>> >>> My reading of ?regex led me to believe that >>> >>> gsub("[:alpha:]","",x) >>> >>> should give the result that I want. >> That's looking for any of the characters a, l, p, h, : . > > OK. I see that now. I don't think that it's really stated anywhere that to search for (and possibly change) any one of a string of characters you enclose that string of characters in brackets [ ].That's explained on the ?regex page in the section on character classes. The source of confusion for you is that within regex character classes there is also a set of reserved constructions that all start and end with "[:" and ":]". It's a bit like needed to double or triple escape characters in regex. a leading "|" changes the parser settings (or "expectations" if one wants to anthropomorphize the process.> > The first example from ?grep makes this "clear" (for some value of the word "clear") once you understand what this example is on about. > > So it's "obvious" once you've been shown, and totally opaque until then.Sometimes we all stumble over syntactic "special" detours. If you wanted to add a warning to the current ?regex tex, you could submit a diff for the base package, perhaps with something like: "Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale." Replaced with: "Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale. Their names do include the "[:" and ":]" characters."> >> What you meant to say was >> gsub("[[:alpha:]]","",x) >> i.e. the character class [:alpha:] within a character set. > > Yup. Got it. Thanks very much. > > cheers, > > Rolf > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
> On Aug 24, 2017, at 10:20 AM, David Winsemius <dwinsemius at comcast.net> wrote: > > >> On Aug 23, 2017, at 2:29 AM, Rolf Turner <r.turner at auckland.ac.nz> wrote: >> >> >> On 23/08/17 18:33, Stefan Evert wrote: >> >>>> On 23 Aug 2017, at 07:45, Rolf Turner <r.turner at auckland.ac.nz> wrote: >>>> >>>> My reading of ?regex led me to believe that >>>> >>>> gsub("[:alpha:]","",x) >>>> >>>> should give the result that I want. >>> That's looking for any of the characters a, l, p, h, : . >> >> OK. I see that now. I don't think that it's really stated anywhere that to search for (and possibly change) any one of a string of characters you enclose that string of characters in brackets [ ]. > > That's explained on the ?regex page in the section on character classes. The source of confusion for you is that within regex character classes there is also a set of reserved constructions that all start and end with "[:" and ":]". It's a bit like needed to double or triple escape characters in regex. a leading "|" changes the parser settings (or "expectations" if one wants to anthropomorphize the process.I meant a leading backslash "\" rather than a vertical bar ("|") -- David.> >> >> The first example from ?grep makes this "clear" (for some value of the word "clear") once you understand what this example is on about. >> >> So it's "obvious" once you've been shown, and totally opaque until then. > > Sometimes we all stumble over syntactic "special" detours. If you wanted to add a warning to the current ?regex tex, you could submit a diff for the base package, perhaps with something like: > > "Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale." > > Replaced with: > > "Certain named classes of characters are predefined. Their interpretation depends on the locale (see locales); the interpretation below is that of the POSIX locale. Their names do include the "[:" and ":]" characters." > > >> >>> What you meant to say was >>> gsub("[[:alpha:]]","",x) >>> i.e. the character class [:alpha:] within a character set. >> >> Yup. Got it. Thanks very much. >>