thr3ads.net - similar to: "strcapture performance when perl = TRUE"

Displaying 20 results from an estimated 100 matches similar to: "strcapture performance when perl = TRUE"

2016 Oct 04

error handling in strcapture

It is also not catching the cases where the number of capture expressions does not match the number of entries in proto. I think all of the following should give an error about the mismatch. > strcapture("(.)(.)", c("ab", "cde", "fgh", "ij", "lm"), proto=list(A="",B="",C="")) A B C 1 a b cd 2 d

Named capture in regexp

2011 Feb 25

Named capture in regexp

Dear R core developers, One feature from Python that I have been wanting in R is the ability to capture groups in regular expressions using names. Consider the following example in R. > notables <- c(" Ben Franklin and Jefferson Davis","\tMillard Fillmore") > name.rex <- "(?<first>[A-Z][a-z]+) (?<last>[A-Z][a-z]+)" > (parsed <-

error handling in strcapture

2016 Oct 04

error handling in strcapture

I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when the text contains a missing value and perl=TRUE. { # NA in text input should map to row of NA's in output, without warning r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA, "Fifty 50"), data.frame(Initial=factor(), Number=numeric())) e9p <-

error handling in strcapture

2016 Oct 04

error handling in strcapture

Hi Bill, This is a bug in regexec() and I will commit a fix. Thanks for the report, Michael On Tue, Oct 4, 2016 at 1:40 PM, William Dunlap <wdunlap at tibco.com> wrote: > I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when > the text contains a missing value and perl=TRUE. > > { > # NA in text input should map to row of NA's in output,

error handling in strcapture

2016 Sep 21

error handling in strcapture

The new behavior is that it yields NAs when the pattern does not match (like strptime) and for empty captures in a matching pattern it yields the empty string, which is consistent with regmatches(). Michael On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap <wdunlap at tibco.com> wrote: > If there are any matches then strcapture can see if the pattern has the same > number of capture

error handling in strcapture

2016 Sep 21

error handling in strcapture

Hi Bill, Thanks, another good suggestion. strcapture() now returns NAs for non-matches. It's nice to have someone kicking the tires on that function. Michael On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel <r-devel at r-project.org> wrote: > Michael, thanks for looking at my first issue with utils::strcapture. > > Another issue is how it deals with lines that

error handling in strcapture

2016 Sep 21

error handling in strcapture

If there are any matches then strcapture can see if the pattern has the same number of capture expressions as the prototype has columns and give an error if not. That seems appropriate. If there are no matches, then there is no easy way to see if the prototype is compatible with the pattern, so should strcapture just assume the best and fill in the prototype with NA's? Should there be

strcapture enhancement

2016 Sep 21

strcapture enhancement

The new strcapture function in R-devel is handy, capturing the matches to the parenthesized subpatterns in a regular expression in the columns of a data.frame, whose column names and classes are given by the 'proto' argument. E.g., > p1 <- data.frame(Name="", Number=0) > str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty

error handling in strcapture

2016 Sep 21

error handling in strcapture

Michael, thanks for looking at my first issue with utils::strcapture. Another issue is how it deals with lines that don't match the pattern. Currently it gives an error > strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), proto=list(Name="", Number=0)) Error in strcapture("(.+) (.+)", c("One 1",

split() and paste() a vector to get a multi line string

2004 Dec 01

split() and paste() a vector to get a multi line string

How can I get a multi line string from a vector of string tokens in an easy manner (e.g. for the use as xlab of a plot)? I have e.g.: > tokens <- letters[1:5] [1] "a" "b" "c" "d" "e" I search: [1] "a, b, c\nd, e" I tried: > nlines <- 2 > ntokens.line <- ceiling(length(tokens) / nlines) > token.list <-

Feature request: non-dropping regmatches/strextract

2019 Aug 15

Feature request: non-dropping regmatches/strextract

Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my example a bit > x <- c("Groucho <groucho at marx.com>", "<chico at marx.com>", "Harpo") > strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x, proto=data.frame(Name=character(), Address=character(),

Feature request: non-dropping regmatches/strextract

2019 Aug 15

Feature request: non-dropping regmatches/strextract

I don't care much for regmatches and haven't tried strextract, but I think replacing the character(0) by NA_character_ is almost always inappropriate if the match information comes from gregexpr. I think strcapture() does a pretty good job of what I think you are trying to do. Perhaps adding an argument to map no match to NA instead of "" would give you just what you wanted.

[LLVMdev] FreeBSD's 11.0-CURRENT contrib/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h's IntrusiveRefCntPtr and its use violates C++ privacy rules

2015 Mar 15

[LLVMdev] FreeBSD's 11.0-CURRENT contrib/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h's IntrusiveRefCntPtr and its use violates C++ privacy rules

When trying to build the 11.0-CURRENT clang 3.5 on powerpc64 I ran into a violation of C++ accessibility rules (for private) that stopped the compile. So not the usual defect category. (This was a bootstrapping procedure as powerpc/powerpc64 FreeBSD world’s clang has an odd status and getting from 3.4 under 10.1-STABLE to 3.5 on 11.0-CURRENT is not automatic.) Given the language rules and

Feature request: non-dropping regmatches/strextract

2019 Sep 02

Feature request: non-dropping regmatches/strextract

I think that's a good reason for not including this in regmatches; you're right, its name is somewhat suggestive of yielding matches. Also, that sounds like a great design for strcapture with an atomic prototype. Best, CG

[ANNOUNCE] New stable release of Samba Console (1.1.23)

2005 Feb 10

[ANNOUNCE] New stable release of Samba Console (1.1.23)

Hi, I'm officialy announcing the new 1.1.23 stable release of Samba Console, along with a stable IMC release too (1.2.24). This code is just going in production at a new customer site this week. From the project web site : Samba Console <http://imc.sourceforge.net/samba/index.html> is the first console developped for IMC. It offers a simple and ergonomic interface for managing a

Feature request: non-dropping regmatches/strextract

2019 Aug 29

Feature request: non-dropping regmatches/strextract

I'd be happy to entertain patches or at least more specific suggestions to improve strextract() and strcapture(). I hadn't exported strextract(), because I wasn't quite sure how it should behave. This feedback should be helpful. Thanks, Michael On Thu, Aug 29, 2019 at 2:20 PM Cyclic Group Z_1 via R-devel <r-devel at r-project.org> wrote: > > Thank you, I am aware that

Feature request: non-dropping regmatches/strextract

2019 Aug 30

Feature request: non-dropping regmatches/strextract

Just started thinking about this. The name of regmatches() suggests that it will only extract the matches but not return anything for the non-matches. We might need another function that returns a value for non-matches. Perhaps the value should be the empty string for non-matches and NA for matches to NA. The rationale is that we delegate to regexpr() (at least conceptually), and it returns a

Feature request: non-dropping regmatches/strextract

2019 Sep 02

Feature request: non-dropping regmatches/strextract

After some discussion within R core, we decided that a "nomatch" argument on regmatches() may be a good initial step. We might add a new function later that combines the regexpr() and regmatches() steps. The gregexpr() and regexec() inputs are both lists so it's not clear whether a "nomatch" value would be relevant (the elements are empty) in those cases. On Mon, Sep 2,

spamassassin

2005 May 16

spamassassin

pulling my hair out... If I rm -fr /root/.spamassassin/* then sa-learn a bunch of messages, it works... ]# sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 3997 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 202022 0 non-token data: ntokens 0.000

Feature request: non-dropping regmatches/strextract

2019 Aug 15

Feature request: non-dropping regmatches/strextract

I do think keeping the default behavior is desirable for backwards compatibility; my suggestion is not to change default behavior but to add an optional argument that allows a different behavior. Although this can be implemented in a user-defined function, retaining empty matches facilitates programmatic use, and seems to be something that should be available in base R. It is available, for

similar to: strcapture performance when perl = TRUE