Cyclic Group Z_1
2019-Aug-15 18:31 UTC
[Rd] Feature request: non-dropping regmatches/strextract
I do think keeping the default behavior is desirable for backwards compatibility; my suggestion is not to change default behavior but to add an optional argument that allows a different behavior. Although this can be implemented in a user-defined function, retaining empty matches facilitates programmatic use, and seems to be something that should be available in base R. It is available, for example, in MATLAB, a comparable array language. Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL (the default) results in drops for vector outputs and character(0) for list outputs and nomatch = NA results in insertion of NA_character_, and nomatch = '' results in insertion of empty string. I can submit proposed patch code if others think this is a good idea. What are your thoughts on the proposed alteration to (currently nonexported) strextract? I assume (maybe wrongly) that the plan is to eventually export that function. Thank you, CG
William Dunlap
2019-Aug-15 20:04 UTC
[Rd] Feature request: non-dropping regmatches/strextract
I don't care much for regmatches and haven't tried strextract, but I think replacing the character(0) by NA_character_ is almost always inappropriate if the match information comes from gregexpr. I think strcapture() does a pretty good job of what I think you are trying to do. Perhaps adding an argument to map no match to NA instead of "" would give you just what you wanted.> x <- c("Groucho <groucho at marx.com>", "<chico at marx.com>", "Harpo") > d <- strcapture("([[:alpha:]]+)?( *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?",x, proto=data.frame(Name=character(), Junk=character(), Address=character(), stringsAsFactors=FALSE))> d[c("Name", "Address")]Name Address 1 Groucho groucho at marx.com 2 chico at marx.com 3 Harpo> str(.Last.value)'data.frame': 3 obs. of 2 variables: $ Name : chr "Groucho" "" "Harpo" $ Address: chr "groucho at marx.com" "chico at marx.com" "" Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Aug 15, 2019 at 11:31 AM Cyclic Group Z_1 <cyclicgroup-z1 at yahoo.com> wrote:> I do think keeping the default behavior is desirable for backwards > compatibility; my suggestion is not to change default behavior but to add > an optional argument that allows a different behavior. Although this can be > implemented in a user-defined function, retaining empty matches facilitates > programmatic use, and seems to be something that should be available in > base R. It is available, for example, in MATLAB, a comparable array > language. > > Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the > spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL > (the default) results in drops for vector outputs and character(0) for list > outputs and nomatch = NA results in insertion of NA_character_, and nomatch > = '' results in insertion of empty string. > > I can submit proposed patch code if others think this is a good idea. > > What are your thoughts on the proposed alteration to (currently > nonexported) strextract? I assume (maybe wrongly) that the plan is to > eventually export that function. > > Thank you, > CG >[[alternative HTML version deleted]]
William Dunlap
2019-Aug-15 20:39 UTC
[Rd] Feature request: non-dropping regmatches/strextract
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my example a bit> x <- c("Groucho <groucho at marx.com>", "<chico at marx.com>", "Harpo") > strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x,proto=data.frame(Name=character(), Address=character(), stringsAsFactors=FALSE)) Name Address 1 Groucho groucho at marx.com 2 chico at marx.com 3 Harpo Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Aug 15, 2019 at 1:04 PM William Dunlap <wdunlap at tibco.com> wrote:> I don't care much for regmatches and haven't tried strextract, but I think > replacing the character(0) by NA_character_ is almost always inappropriate > if the match information comes from gregexpr. > > I think strcapture() does a pretty good job of what I think you are trying > to do. Perhaps adding an argument to map no match to NA instead of "" > would give you just what you wanted. > > > x <- c("Groucho <groucho at marx.com>", "<chico at marx.com>", "Harpo") > > d <- strcapture("([[:alpha:]]+)?( *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", > x, proto=data.frame(Name=character(), Junk=character(), > Address=character(), stringsAsFactors=FALSE)) > > d[c("Name", "Address")] > Name Address > 1 Groucho groucho at marx.com > 2 chico at marx.com > 3 Harpo > > str(.Last.value) > 'data.frame': 3 obs. of 2 variables: > $ Name : chr "Groucho" "" "Harpo" > $ Address: chr "groucho at marx.com" "chico at marx.com" "" > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Thu, Aug 15, 2019 at 11:31 AM Cyclic Group Z_1 < > cyclicgroup-z1 at yahoo.com> wrote: > >> I do think keeping the default behavior is desirable for backwards >> compatibility; my suggestion is not to change default behavior but to add >> an optional argument that allows a different behavior. Although this can be >> implemented in a user-defined function, retaining empty matches facilitates >> programmatic use, and seems to be something that should be available in >> base R. It is available, for example, in MATLAB, a comparable array >> language. >> >> Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the >> spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL >> (the default) results in drops for vector outputs and character(0) for list >> outputs and nomatch = NA results in insertion of NA_character_, and nomatch >> = '' results in insertion of empty string. >> >> I can submit proposed patch code if others think this is a good idea. >> >> What are your thoughts on the proposed alteration to (currently >> nonexported) strextract? I assume (maybe wrongly) that the plan is to >> eventually export that function. >> >> Thank you, >> CG >> >[[alternative HTML version deleted]]
Possibly Parallel Threads
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract