search for: pcre_no_utf8_check

Displaying 1 result from an estimated 1 matches for "pcre_no_utf8_check".

2017 Jan 06
0
strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings
...0.21 38.58 elapsed 1048576 0.30 0.08 0.52 155.50 0.40 155.43 I have not looked at R's code, but it is possible that the problem is caused by PCRE repeatedly scanning (once per match) the entire input string to make sure it is valid UTF-8. If so, adding PCRE_NO_UTF8_CHECK to the flags given to pcre_exec would solve the problem. Perhaps R is already doing that in gsub(perl=TRUE). Here is the test function: regex.perf.test <- function(N=c(1e4, 2e4, 4e4, 8e4)) { makeTestString <- function(n) paste(collapse="", rep("ab", n)) s <- lap...