Cyclic Group Z_1
2019-Aug-29 22:26 UTC
[Rd] Feature request: non-dropping regmatches/strextract
Thank you! I greatly appreciate your consideration, though of course it is up to you. I think many people switch to stringr/stringi simply because functions in those packages have some consistent design choices, for example, they do not drop empty/missing matches, which facilitates array-based programming. For example, in the cases where one needs to make a new column in a data.frame (data.table, tibble, etc.) of regex extractions. Or in any other case where there needs to be an element-wise correspondence between input and output. I think insertion of NA_character_ to prevent dropping indices seems like the natural choice for an array language (which, I think, motivated the creation of stringr/stringi). While those are great packages and this behavior can be easily replicated with simple wrappers, string operations are normally easy to accomplish in base languages, so this seems like something that would be appropriate to have in base. For example, MATLAB and Pandas regex both allow non-dropping empty matches (though of course I acknowledge Pandas is not a base language). Best, CG
Michael Lawrence
2019-Aug-30 03:44 UTC
[Rd] Feature request: non-dropping regmatches/strextract
Just started thinking about this. The name of regmatches() suggests that it will only extract the matches but not return anything for the non-matches. We might need another function that returns a value for non-matches. Perhaps the value should be the empty string for non-matches and NA for matches to NA. The rationale is that we delegate to regexpr() (at least conceptually), and it returns a "matching region" which would be empty when there is no match. We could allow strcapture() to accept an atomic vector as a prototype, which would do what you want for regexec() (NA on no match, empty string on empty capture). Then we could call the regexpr()-based function strextract(). What do you think? Michael On Thu, Aug 29, 2019 at 3:27 PM Cyclic Group Z_1 <cyclicgroup-z1 at yahoo.com> wrote:> > Thank you! I greatly appreciate your consideration, though of course it is up to you. I think many people switch to stringr/stringi simply because functions in those packages have some consistent design choices, for example, they do not drop empty/missing matches, which facilitates array-based programming. For example, in the cases where one needs to make a new column in a data.frame (data.table, tibble, etc.) of regex extractions. Or in any other case where there needs to be an element-wise correspondence between input and output. I think insertion of NA_character_ to prevent dropping indices seems like the natural choice for an array language (which, I think, motivated the creation of stringr/stringi). While those are great packages and this behavior can be easily replicated with simple wrappers, string operations are normally easy to accomplish in base languages, so this seems like something that would be appropriate to have in base. For example, MATLAB and Pandas regex both allow non-dropping empty matches (though of course I acknowledge Pandas is not a base language). > > Best, > CG-- Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
Cyclic Group Z_1
2019-Sep-02 18:38 UTC
[Rd] Feature request: non-dropping regmatches/strextract
I think that's a good reason for not including this in regmatches; you're right, its name is somewhat suggestive of yielding matches. Also, that sounds like a great design for strcapture with an atomic prototype. Best, CG
Reasonably Related Threads
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract