Cyclic Group Z_1
2019-Aug-29 21:19 UTC
[Rd] Feature request: non-dropping regmatches/strextract
Thank you, I am aware that there are packages that can accomplish this. I mentioned stringr::str_extract as a function that does not drop empty matches. I think that the behavior of regmatches(..., regexpr(...))?in base R should permit an option to prevent dropping of empty matches both for sake of consistency with the rest of the language (missing data does not yield a dropped index in other sorts of R functions, and an empty match conceptually corresponds with missing data) and facility of use in data.frames. The behavior of regmatches(..., gregexpr(...)) is not objectionable to me, as lists do not drop indices when they contain character(0) vectors. Alternatively, perhaps this should be reflected in the (currently non-exported) strextract. Best, CG
Michael Lawrence
2019-Aug-29 21:29 UTC
[Rd] Feature request: non-dropping regmatches/strextract
I'd be happy to entertain patches or at least more specific suggestions to improve strextract() and strcapture(). I hadn't exported strextract(), because I wasn't quite sure how it should behave. This feedback should be helpful. Thanks, Michael On Thu, Aug 29, 2019 at 2:20 PM Cyclic Group Z_1 via R-devel <r-devel at r-project.org> wrote:> > Thank you, I am aware that there are packages that can accomplish this. I mentioned stringr::str_extract as a function that does not drop empty matches. I think that the behavior of regmatches(..., regexpr(...)) in base R should permit an option to prevent dropping of empty matches both for sake of consistency with the rest of the language (missing data does not yield a dropped index in other sorts of R functions, and an empty match conceptually corresponds with missing data) and facility of use in data.frames. The behavior of regmatches(..., gregexpr(...)) is not objectionable to me, as lists do not drop indices when they contain character(0) vectors. Alternatively, perhaps this should be reflected in the (currently non-exported) strextract. > > Best, > CG > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel-- Michael Lawrence Scientist, Bioinformatics and Computational Biology Genentech, A Member of the Roche Group Office +1 (650) 225-7760 michafla at gene.com Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
Cyclic Group Z_1
2019-Aug-29 22:26 UTC
[Rd] Feature request: non-dropping regmatches/strextract
Thank you! I greatly appreciate your consideration, though of course it is up to you. I think many people switch to stringr/stringi simply because functions in those packages have some consistent design choices, for example, they do not drop empty/missing matches, which facilitates array-based programming. For example, in the cases where one needs to make a new column in a data.frame (data.table, tibble, etc.) of regex extractions. Or in any other case where there needs to be an element-wise correspondence between input and output. I think insertion of NA_character_ to prevent dropping indices seems like the natural choice for an array language (which, I think, motivated the creation of stringr/stringi). While those are great packages and this behavior can be easily replicated with simple wrappers, string operations are normally easy to accomplish in base languages, so this seems like something that would be appropriate to have in base. For example, MATLAB and Pandas regex both allow non-dropping empty matches (though of course I acknowledge Pandas is not a base language). Best, CG
Apparently Analagous Threads
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract
- Feature request: non-dropping regmatches/strextract