Displaying 1 result from an estimated 1 matches for "pcre_no_utf8_check".
2017 Jan 06
0
strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings
...0.21 38.58
elapsed 1048576 0.30 0.08 0.52 155.50 0.40 155.43
I have not looked at R's code, but it is possible that the problem is
caused by PCRE repeatedly scanning (once per match) the entire input
string to make sure it is valid UTF-8. If so, adding
PCRE_NO_UTF8_CHECK to the flags given to pcre_exec would solve the
problem. Perhaps R is already doing that in gsub(perl=TRUE).
Here is the test function:
regex.perf.test <- function(N=c(1e4, 2e4, 4e4, 8e4)) {
makeTestString <- function(n) paste(collapse="", rep("ab", n))
s <- lap...