Displaying 18 results from an estimated 18 matches for "strcapture".
2016 Oct 04
2
error handling in strcapture
I noticed a problem in the strcapture from R-devel (2016-09-27 r71386),
when the text contains a missing value and perl=TRUE.
{
# NA in text input should map to row of NA's in output, without
warning
r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA,
"Fifty 50"), data...
2016 Oct 04
1
error handling in strcapture
It is also not catching the cases where the number of capture expressions
does not match the number of entries in proto. I think all of the
following should give an error about the mismatch.
> strcapture("(.)(.)", c("ab", "cde", "fgh", "ij", "lm"),
proto=list(A="",B="",C=""))
A B C
1 a b cd
2 d fg f
3 ij i j
4 l m ab
Warning message:
In matrix(as.character(unlist(str)), ncol = ntokens, byrow = TRU...
2016 Sep 21
2
error handling in strcapture
If there are any matches then strcapture can see if the pattern has the
same number of capture expressions as the prototype has columns and give an
error if not. That seems appropriate.
If there are no matches, then there is no easy way to see if the prototype
is compatible with the pattern, so should strcapture just assume the best
and...
2024 Jan 29
1
strcapture performance when perl = TRUE
I wanted to raise the possibility of improving strcapture performance in
cases where perl = TRUE. I believe we can do this in a non-breaking way
by calling regexpr instead of regexec (conditionally when perl = TRUE).
To illustrate this I've put together a 'proof of concept' function called
strcapture2 that utilises output from regexpr directly...
2016 Sep 21
2
error handling in strcapture
Michael, thanks for looking at my first issue with utils::strcapture.
Another issue is how it deals with lines that don't match the pattern.
Currently it gives an error
> strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"),
proto=list(Name="", Number=0))
Error in strcapture("(.+) (.+)",...
2016 Sep 21
2
strcapture enhancement
The new strcapture function in R-devel is handy, capturing
the matches to the parenthesized subpatterns in a regular
expression in the columns of a data.frame, whose column
names and classes are given by the 'proto' argument. E.g.,
> p1 <- data.frame(Name="", Number=0)
> str(strcapture(&q...
2016 Oct 04
0
error handling in strcapture
Hi Bill,
This is a bug in regexec() and I will commit a fix.
Thanks for the report,
Michael
On Tue, Oct 4, 2016 at 1:40 PM, William Dunlap <wdunlap at tibco.com> wrote:
> I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when
> the text contains a missing value and perl=TRUE.
>
> {
> # NA in text input should map to row of NA's in output, without
> warning
> r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA...
2016 Sep 21
0
error handling in strcapture
...yields NAs when the pattern does not match
(like strptime) and for empty captures in a matching pattern it yields
the empty string, which is consistent with regmatches().
Michael
On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap <wdunlap at tibco.com> wrote:
> If there are any matches then strcapture can see if the pattern has the same
> number of capture expressions as the prototype has columns and give an
> error if not. That seems appropriate.
>
> If there are no matches, then there is no easy way to see if the prototype
> is compatible with the pattern, so should strcapture...
2016 Sep 21
0
error handling in strcapture
Hi Bill,
Thanks, another good suggestion. strcapture() now returns NAs for
non-matches. It's nice to have someone kicking the tires on that
function.
Michael
On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel
<r-devel at r-project.org> wrote:
> Michael, thanks for looking at my first issue with utils::strcapture.
>
> An...
2019 Aug 15
2
Feature request: non-dropping regmatches/strextract
I do think keeping the default behavior is desirable for backwards compatibility; my suggestion is not to change default behavior but to add an optional argument that allows a different behavior. Although this can be implemented in a user-defined function, retaining empty matches facilitates programmatic use, and seems to be something that should be available in base R. It is available, for
2019 Aug 15
1
Feature request: non-dropping regmatches/strextract
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my
example a bit
> x <- c("Groucho <groucho at marx.com>", "<chico at marx.com>", "Harpo")
> strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x,
proto=data.frame(Name=character(), Address=character(),
stringsAsFactors=FALSE))
Name Address
1 Groucho groucho at marx.com
2 chico at marx.com
3 Harpo
Bill Dunlap
TIBCO Software
wdunlap tibco.c...
2019 Aug 15
0
Feature request: non-dropping regmatches/strextract
I don't care much for regmatches and haven't tried strextract, but I think
replacing the character(0) by NA_character_ is almost always inappropriate
if the match information comes from gregexpr.
I think strcapture() does a pretty good job of what I think you are trying
to do. Perhaps adding an argument to map no match to NA instead of ""
would give you just what you wanted.
> x <- c("Groucho <groucho at marx.com>", "<chico at marx.com>", "Harpo")
&g...
2019 Sep 02
2
Feature request: non-dropping regmatches/strextract
I think that's a good reason for not including this in regmatches; you're right, its name is somewhat suggestive of yielding matches. Also, that sounds like a great design for strcapture with an atomic prototype.
Best,
CG
2019 Aug 29
2
Feature request: non-dropping regmatches/strextract
Thank you! I greatly appreciate your consideration, though of course it is up to you. I think many people switch to stringr/stringi simply because functions in those packages have some consistent design choices, for example, they do not drop empty/missing matches, which facilitates array-based programming. For example, in the cases where one needs to make a new column in a data.frame (data.table,
2019 Aug 29
0
Feature request: non-dropping regmatches/strextract
I'd be happy to entertain patches or at least more specific
suggestions to improve strextract() and strcapture(). I hadn't
exported strextract(), because I wasn't quite sure how it should
behave. This feedback should be helpful.
Thanks,
Michael
On Thu, Aug 29, 2019 at 2:20 PM Cyclic Group Z_1 via R-devel
<r-devel at r-project.org> wrote:
>
> Thank you, I am aware that there are package...
2019 Aug 30
0
Feature request: non-dropping regmatches/strextract
...at returns a value for
non-matches. Perhaps the value should be the empty string for
non-matches and NA for matches to NA. The rationale is that we
delegate to regexpr() (at least conceptually), and it returns a
"matching region" which would be empty when there is no match. We
could allow strcapture() to accept an atomic vector as a prototype,
which would do what you want for regexec() (NA on no match, empty
string on empty capture). Then we could call the regexpr()-based
function strextract().
What do you think?
Michael
On Thu, Aug 29, 2019 at 3:27 PM Cyclic Group Z_1
<cyclicgroup-z1 at...
2019 Sep 02
0
Feature request: non-dropping regmatches/strextract
...those cases.
On Mon, Sep 2, 2019 at 11:38 AM Cyclic Group Z_1
<cyclicgroup-z1 at yahoo.com> wrote:
>
> I think that's a good reason for not including this in regmatches; you're right, its name is somewhat suggestive of yielding matches. Also, that sounds like a great design for strcapture with an atomic prototype.
>
> Best,
> CG
--
Michael Lawrence
Scientist, Bioinformatics and Computational Biology
Genentech, A Member of the Roche Group
Office +1 (650) 225-7760
michafla at gene.com
Join Genentech on LinkedIn | Twitter | Facebook | Instagram | YouTube
2019 Aug 29
2
Feature request: non-dropping regmatches/strextract
Thank you, I am aware that there are packages that can accomplish this. I mentioned stringr::str_extract as a function that does not drop empty matches. I think that the behavior of regmatches(..., regexpr(...))?in base R should permit an option to prevent dropping of empty matches both for sake of consistency with the rest of the language (missing data does not yield a dropped index in other