thr3ads.net - R devel - [Rd] Feature request: non-dropping regmatches/strextract [Aug 2019]

If this information is useful, please help other people find it:
Share via:

Cyclic Group Z_1

2019-Aug-29 22:26 UTC

[Rd] Feature request: non-dropping regmatches/strextract

Thank you! I greatly appreciate your consideration, though of course it is up to
you. I think many people switch to stringr/stringi simply because functions in
those packages have some consistent design choices, for example, they do not
drop empty/missing matches, which facilitates array-based programming. For
example, in the cases where one needs to make a new column in a data.frame
(data.table, tibble, etc.) of regex extractions. Or in any other case where
there needs to be an element-wise correspondence between input and output. I
think insertion of NA_character_ to prevent dropping indices seems like the
natural choice for an array language (which, I think, motivated the creation of
stringr/stringi). While those are great packages and this behavior can be easily
replicated with simple wrappers, string operations are normally easy to
accomplish in base languages, so this seems like something that would be
appropriate to have in base. For example, MATLAB and Pandas regex both allow
non-dropping empty matches (though of course I acknowledge Pandas is not a base
language).

Best,
CG

Michael Lawrence

2019-Aug-30 03:44 UTC

head link

[Rd] Feature request: non-dropping regmatches/strextract

Just started thinking about this. The name of regmatches() suggests
that it will only extract the matches but not return anything for the
non-matches. We might need another function that returns a value for
non-matches. Perhaps the value should be the empty string for
non-matches and NA for matches to NA. The rationale is that we
delegate to regexpr() (at least conceptually), and it returns a
"matching region" which would be empty when there is no match. We
could allow strcapture() to accept an atomic vector as a prototype,
which would do what you want for regexec() (NA on no match, empty
string on empty capture). Then we could call the regexpr()-based
function strextract().

What do you think?

Michael

On Thu, Aug 29, 2019 at 3:27 PM Cyclic Group Z_1
<cyclicgroup-z1 at yahoo.com> wrote:>
> Thank you! I greatly appreciate your consideration, though of course it is
up to you. I think many people switch to stringr/stringi simply because
functions in those packages have some consistent design choices, for example,
they do not drop empty/missing matches, which facilitates array-based
programming. For example, in the cases where one needs to make a new column in a
data.frame (data.table, tibble, etc.) of regex extractions. Or in any other case
where there needs to be an element-wise correspondence between input and output.
I think insertion of NA_character_ to prevent dropping indices seems like the
natural choice for an array language (which, I think, motivated the creation of
stringr/stringi). While those are great packages and this behavior can be easily
replicated with simple wrappers, string operations are normally easy to
accomplish in base languages, so this seems like something that would be
appropriate to have in base. For example, MATLAB and Pandas regex both allow
non-dropping empty matches (though of course I acknowledge Pandas is not a base
language).
>
> Best,
> CG

-- 
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla at gene.com

Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube

Cyclic Group Z_1

2019-Sep-02 18:38 UTC

head link

[Rd] Feature request: non-dropping regmatches/strextract

I think that's a good reason for not including this in regmatches;
you're right, its name is somewhat suggestive of yielding matches. Also,
that sounds like a great design for strcapture with an atomic prototype.

Best,
CG

Maybe Matching Threads

Search for more maybe matching threads

R devel - Aug 2019 - Feature request: non-dropping regmatches/strextract

[Rd] Feature request: non-dropping regmatches/strextract

[Rd] Feature request: non-dropping regmatches/strextract

[Rd] Feature request: non-dropping regmatches/strextract

Maybe Matching Threads