similar to: error handling in strcapture

Displaying 20 results from an estimated 1000 matches similar to: "error handling in strcapture"

2016 Sep 21
2
error handling in strcapture
If there are any matches then strcapture can see if the pattern has the same number of capture expressions as the prototype has columns and give an error if not. That seems appropriate. If there are no matches, then there is no easy way to see if the prototype is compatible with the pattern, so should strcapture just assume the best and fill in the prototype with NA's? Should there be
2016 Oct 04
2
error handling in strcapture
I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when the text contains a missing value and perl=TRUE. { # NA in text input should map to row of NA's in output, without warning r9p <- strcapture(perl = TRUE, "(.).* ([[:digit:]]+)", c("One 1", NA, "Fifty 50"), data.frame(Initial=factor(), Number=numeric())) e9p <-
2016 Oct 04
1
error handling in strcapture
It is also not catching the cases where the number of capture expressions does not match the number of entries in proto. I think all of the following should give an error about the mismatch. > strcapture("(.)(.)", c("ab", "cde", "fgh", "ij", "lm"), proto=list(A="",B="",C="")) A B C 1 a b cd 2 d
2016 Sep 21
0
error handling in strcapture
The new behavior is that it yields NAs when the pattern does not match (like strptime) and for empty captures in a matching pattern it yields the empty string, which is consistent with regmatches(). Michael On Wed, Sep 21, 2016 at 2:21 PM, William Dunlap <wdunlap at tibco.com> wrote: > If there are any matches then strcapture can see if the pattern has the same > number of capture
2016 Sep 21
0
error handling in strcapture
Hi Bill, Thanks, another good suggestion. strcapture() now returns NAs for non-matches. It's nice to have someone kicking the tires on that function. Michael On Wed, Sep 21, 2016 at 12:11 PM, William Dunlap via R-devel <r-devel at r-project.org> wrote: > Michael, thanks for looking at my first issue with utils::strcapture. > > Another issue is how it deals with lines that
2016 Oct 04
0
error handling in strcapture
Hi Bill, This is a bug in regexec() and I will commit a fix. Thanks for the report, Michael On Tue, Oct 4, 2016 at 1:40 PM, William Dunlap <wdunlap at tibco.com> wrote: > I noticed a problem in the strcapture from R-devel (2016-09-27 r71386), when > the text contains a missing value and perl=TRUE. > > { > # NA in text input should map to row of NA's in output,
2016 Sep 21
2
strcapture enhancement
The new strcapture function in R-devel is handy, capturing the matches to the parenthesized subpatterns in a regular expression in the columns of a data.frame, whose column names and classes are given by the 'proto' argument. E.g., > p1 <- data.frame(Name="", Number=0) > str(strcapture("([[:alpha:]]*) +([[:digit:]]*)", c("Three 3", "Twenty
2024 Jan 29
1
strcapture performance when perl = TRUE
I wanted to raise the possibility of improving strcapture performance in cases where perl = TRUE. I believe we can do this in a non-breaking way by calling regexpr instead of regexec (conditionally when perl = TRUE). To illustrate this I've put together a 'proof of concept' function called strcapture2 that utilises output from regexpr directly (following a very nice substring approach
2017 Jun 28
1
regexec() bug in R 3.4.0
Hi, In R 3.4.0, the "Pattern Matching and Replacement" documentation that describes regexec(), gregexpr(), etc. states that the "text" argument to regexec is a character vector, "or an object which can be coerced by as.character to a character vector": regexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting. For consistency with other R functions and compatibility with this use
2019 Sep 02
2
Feature request: non-dropping regmatches/strextract
I think that's a good reason for not including this in regmatches; you're right, its name is somewhat suggestive of yielding matches. Also, that sounds like a great design for strcapture with an atomic prototype. Best, CG
2009 Apr 13
1
should sub(perl=TRUE) also handle \E in replacement, to complement \U and \L?
Currently sub(perl=TRUE) allows you to specify \U and \L in the replacement argument so that the rest of the subpatterns in the line (the \\<digit> things) will be converted to upper or lower case, respectively. perl also also has a \E operator to end these case conversions for the rest of the subpatterns (so they retain whatever case they had in the original text). For symmetry's sake
2019 Aug 29
2
Feature request: non-dropping regmatches/strextract
Thank you! I greatly appreciate your consideration, though of course it is up to you. I think many people switch to stringr/stringi simply because functions in those packages have some consistent design choices, for example, they do not drop empty/missing matches, which facilitates array-based programming. For example, in the cases where one needs to make a new column in a data.frame (data.table,
2009 Mar 10
1
suggestion/request: install.packages and unnecessary file modifications
Dear R-devel When 'install.packages' runs, it updates all html files in all packages. Mostly, there seems to be no actual change to the html file contents, but the date/time does change. This has causing been me a bit of trouble, because I keep synchronized versions of R on several different machines, and whenever I install a package, many MB of file transfers are required; my slow upload
2006 Jan 27
4
regular expressions, sub
Hi, I am trying to use sub, regexpr on expressions like log(D) ~ log(N)+I(log(N)^2)+log(t) being a model specification. The aim is to produce: "ln D ~ ln N + ln^2 N + ln t" The variable names N, t may change, the number of terms too. I succeded only partially, help on regular expressions is hard to understand for me, examples on my case are rare. The help page on R-help
2014 Oct 19
1
Writing UTF8 on Windows
Recent functionality in jsonlite allows for streaming json to a user supplied connection object, such as a file, pipe or socket. RFC7159 prescribes json must be encoded as unicode; ISO-8859 (including latin1) is invalid. Hence I would like R to write strings as utf8, irrespective of the type of connection, platform or locale. Implementing this turns out to be unsurprisingly difficult on windows.
2017 Jun 08
0
regular expression help
Zitat von Ashim Kapoor <ashimkapoor at gmail.com>: > Dear All, > > My query is: > > Do we always need to use perl = TRUE option when doing ignore.case=TRUE? > > A small example : > > my_text = > "RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n RC No. 162/2015\nSBI > VS RAMESH GUPTA.\n Dated: 01.03.2016 Item no.01\n > Present:
2019 Aug 15
1
Feature request: non-dropping regmatches/strextract
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my example a bit > x <- c("Groucho <groucho at marx.com>", "<chico at marx.com>", "Harpo") > strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x, proto=data.frame(Name=character(), Address=character(),
2017 Jun 08
2
regular expression help
Dear All, My query is: Do we always need to use perl = TRUE option when doing ignore.case=TRUE? A small example : my_text = "RECOVERY OFFICER-II\nDEBTS RECOVERY TRIBUNAL-III\n RC No. 162/2015\nSBI VS RAMESH GUPTA.\n Dated: 01.03.2016 Item no.01\n Present: Ms. Sonakshi, the proxy counsel for Ms. Usha Singh, the counsel for ARCIL.\n None for the CDs.\n
2018 Feb 15
2
writeLines argument useBytes = TRUE still making conversions
I think this behavior is inconsistent with the documentation: tmp <- '?' tmp <- iconv(tmp, to = 'UTF-8') print(Encoding(tmp)) print(charToRaw(tmp)) tmpfilepath <- tempfile() writeLines(tmp, con = file(tmpfilepath, encoding = 'UTF-8'), useBytes = TRUE) [1] "UTF-8" [1] c3 a9 Raw text as hex: c3 83 c2 a9 If I switch to useBytes = FALSE, then