thr3ads.net - similar to: "regexec() bug in R 3.4.0"

Displaying 20 results from an estimated 5000 matches similar to: "regexec() bug in R 3.4.0"

Feature request: non-dropping regmatches/strextract

2019 Aug 15

Feature request: non-dropping regmatches/strextract

A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting. For consistency with other R functions and compatibility with this use

Feature request: non-dropping regmatches/strextract

2019 Aug 15

Feature request: non-dropping regmatches/strextract

Changing the default behavior of regmatches would break its use with gregexpr, where the number of matches per input element faries, so a zero-length character vector makes more sense than NA_character_. > x <- c("John Doe", "e e cummings", "Juan de la Madrid") > m <- gregexpr("[A-Z]", x) > regmatches(x,m) [[1]] [1] "J"

error handling in strcapture

2016 Sep 21

error handling in strcapture

Michael, thanks for looking at my first issue with utils::strcapture. Another issue is how it deals with lines that don't match the pattern. Currently it gives an error > strcapture("(.+) (.+)", c("One 1", "noSpaceInLine", "Three 3"), proto=list(Name="", Number=0)) Error in strcapture("(.+) (.+)", c("One 1",

invert argument in grep

2006 Nov 09

invert argument in grep

Hello, What about an `invert` argument in grep, to return elements that are *not* matching a regular expression : R> grep("pink", colors(), invert = TRUE, value = TRUE) would essentially return the same as : R> colors() [ - grep("pink", colors()) ] I'm attaching the files that I modified (against today's tarball) for that purpose. Cheers, Romain --

strcapture performance when perl = TRUE

2024 Jan 29

strcapture performance when perl = TRUE

I wanted to raise the possibility of improving strcapture performance in cases where perl = TRUE. I believe we can do this in a non-breaking way by calling regexpr instead of regexec (conditionally when perl = TRUE). To illustrate this I've put together a 'proof of concept' function called strcapture2 that utilises output from regexpr directly (following a very nice substring approach

error handling in strcapture

2016 Oct 04

error handling in strcapture

It is also not catching the cases where the number of capture expressions does not match the number of entries in proto. I think all of the following should give an error about the mismatch. > strcapture("(.)(.)", c("ab", "cde", "fgh", "ij", "lm"), proto=list(A="",B="",C="")) A B C 1 a b cd 2 d

error handling in strcapture

2016 Oct 04

error handling in strcapture

I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when the text contains a missing value and perl=TRUE. { # NA in text input should map to row of NA's in output, without warning r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA, "Fifty 50"), data.frame(Initial=factor(), Number=numeric())) e9p <-

error handling in strcapture

2016 Sep 21

error handling in strcapture

If there are any matches then strcapture can see if the pattern has the same number of capture expressions as the prototype has columns and give an error if not. That seems appropriate. If there are no matches, then there is no easy way to see if the prototype is compatible with the pattern, so should strcapture just assume the best and fill in the prototype with NA's? Should there be

regexp bug in very recent r-devel

2007 May 22

regexp bug in very recent r-devel

completion is semi-broken in today's r-devel, and the reason seems to be some regular expression changes: > sessionInfo() R version 2.6.0 Under development (unstable) (2007-05-22 r41673) i686-pc-linux-gnu locale: [...] attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7]

pattern matching

2013 Jan 07

pattern matching

Hi, I have a simple question. Suppose I have a string "x$Expensive". I want to find the position of the $ in this string; i.e., I want a function that returns 2. I tried grep, regexpr, etc with no luck, unless I'm just using them incorrectly. Any suggestions? Thanks, Walt ________________________ Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro,

Word boundaries and gregexpr in R 2.2.1

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1

Hi I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings. text<-c("This is a first example sentence.", "And this is a second example sentence.") If I

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

Full_Name: Stefan Th. Gries Version: 2.2.1 OS: Windows XP (Home and Professional) Submission from: (NULL) (68.6.34.104) The problem is this: I have a vector of two character strings. > text<-c("This is a first example sentence.", "And this is a second example sentence.") If I now look for word boundaries with regexpr, this is what I get: >

How to use access results of gregexpr in data frames

2012 Mar 30

How to use access results of gregexpr in data frames

Hello, I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column. I've used the following code successfully to find the first instance of "/". dframe <- data.frame(date=c("5/14/2011", "4/7/2011")) dframe$x1 <- regexpr("/", dframe[, 1])

Pattern match

2013 Mar 20

Pattern match

Hello again, in the help page of grep() function, it is written that pattern: character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Coerced by as.character to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are allowed

error handling in strcapture

2016 Oct 04

error handling in strcapture

Hi Bill, This is a bug in regexec() and I will commit a fix. Thanks for the report, Michael On Tue, Oct 4, 2016 at 1:40 PM, William Dunlap <wdunlap at tibco.com> wrote: > I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when > the text contains a missing value and perl=TRUE. > > { > # NA in text input should map to row of NA's in output,

strsplit("dia ma", "\\b") splits characterwise

2010 Jul 08

strsplit("dia ma", "\\b") splits characterwise

\b is word boundary. But, unexpectedly, strsplit("dia ma", "\\b") splits character by character. > strsplit("dia ma", "\\b") [[1]] [1] "d" "i" "a" " " "m" "a" > strsplit("dia ma", "\\b", perl=TRUE) [[1]] [1] "d" "i" "a" " "

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

Full_Name: Reid Thompson Version: 2.8.0 RC (2008-10-12 r46696) OS: darwin9.5.0 Submission from: (NULL) (129.98.107.177) the gregexpr() function does NOT return a complete list of global matches as it should. this occurs when a pattern matches two overlapping portions of a string, only the first match is returned. the following function call demonstrates this error (although this is not how I

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

Search within a file

2005 Nov 03

Search within a file

Hi, I am looking for a way to search a file for position of some expression, from within R. My current code: sha1Pos = gregexpr("<sha1>", readChar(filename, file.info(filename)$size))[[1]] Works fine for small files, but text files I will be working with might get up to Gb range, so I was trying to accomplish the same without loading the whole file into R. I realize this is

Feature request: non-dropping regmatches/strextract

2019 Sep 02

Feature request: non-dropping regmatches/strextract

After some discussion within R core, we decided that a "nomatch" argument on regmatches() may be a good initial step. We might add a new function later that combines the regexpr() and regmatches() steps. The gregexpr() and regexec() inputs are both lists so it's not clear whether a "nomatch" value would be relevant (the elements are empty) in those cases. On Mon, Sep 2,

similar to: regexec() bug in R 3.4.0