similar to: patch for gregexpr(perl=TRUE)

Displaying 20 results from an estimated 600 matches similar to: "patch for gregexpr(perl=TRUE)"

2019 Feb 20
2
Bug: time complexity of substring is quadratic as string size and number of substrings increases
Hi all, (and especially hi to Tomas Kalibera who accepted my patch sent yesterday) I believe that I have found another bug, this time in the substring function. The use case that I am concerned with is when there is a single (character scalar) text/subject, and many substrings to extract. For example substring("AAAA", 1:4, 1:4) or more generally, N=1000
2019 Feb 22
1
Bug: time complexity of substring is quadratic as string size and number of substrings increases
On 2/20/19 7:55 PM, Toby Hocking wrote: > Update: I have observed that stringi::stri_sub is linear time complexity, > and it computes the same thing as base::substring. figure > https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png > source: > https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R > > To me this is a
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting. For consistency with other R functions and compatibility with this use
2019 Feb 20
0
Bug: time complexity of substring is quadratic as string size and number of substrings increases
Update: I have observed that stringi::stri_sub is linear time complexity, and it computes the same thing as base::substring. figure https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png source: https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R To me this is a clear indication of a bug in substring, but again it would be nice to have
2007 Oct 10
4
gregexpr (PR#9965)
Full_Name: Peter Dolan Version: 2.5.1 OS: Windows Submission from: (NULL) (128.193.227.43) gregexpr does not find all matching substrings if the substrings overlap: > gregexpr("abab","ababab") [[1]] [1] 1 attr(,"match.length") [1] 4 It does work correctly in Version 2.3.1 under linux.
2011 Aug 17
2
question regarding gregexpr and read.table
Hi, I have a silly question regarding the usage of two commands: read.table and gregexpr: For read.table, if I read a matrix and set header = T, I found that all the dash ("-") becomes dots (".") A = read.table("Matrix.txt", sep = "\t", header = F) A[1,1] # "A-B-C-D". A = read.table("Matrix.txt", sep = "\t", header = T)
2009 Feb 25
1
Using gregexpr with multiple search elements
Dear list, I am trying to use gregexpr to see if entries in a dataframe have either of two possible values for a string. here's an example text<-c("fat", "rat", "cat", "dog", "log", "fish") If I just wanted to find if any one of the elements in text match the pattern "at" I would do gregexpr("\\at", text)
2006 Nov 07
1
Gregexpr - extract results with lapply
Gregexpr - extract results with lapply Hello, I need to extract sequences of three upper case letters in a string. In other words, in this string: str <-c("ABC", "this WOUld be gOOD") The result I'm looking for is ABC WOU OOD. With gregexpr, I can get the position and length of the sequences gregexpr('[A-Z]{3}',str,perl=TRUE) [[1]] [1] 1
2006 Oct 07
2
gregexpr in R 2.3.0 != gregexpr in R 2.4.0
Hi all I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0. In R 2.3.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) [[1]] [1] 1 3 5 7 9 attr(,"match.length") [1] 5 5 5 5 5 ... while in R 2.4.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)
2008 Dec 12
4
gregexpr - match overlap mishandled (PR#13391)
Full_Name: Reid Thompson Version: 2.8.0 RC (2008-10-12 r46696) OS: darwin9.5.0 Submission from: (NULL) (129.98.107.177) the gregexpr() function does NOT return a complete list of global matches as it should. this occurs when a pattern matches two overlapping portions of a string, only the first match is returned. the following function call demonstrates this error (although this is not how I
2008 Dec 12
4
gregexpr - match overlap mishandled (PR#13391)
Full_Name: Reid Thompson Version: 2.8.0 RC (2008-10-12 r46696) OS: darwin9.5.0 Submission from: (NULL) (129.98.107.177) the gregexpr() function does NOT return a complete list of global matches as it should. this occurs when a pattern matches two overlapping portions of a string, only the first match is returned. the following function call demonstrates this error (although this is not how I
2012 Nov 02
2
backreferences in gregexpr
Hi Folks, I'm trying to extract just the backreferences from a regex. > temp = "abcd1234abcd1234" > regmatches(temp, gregexpr("(?:abcd)(1234)", temp)) [[1]] [1] "abcd1234" "abcd1234" What I would like is: [1] "1234" "1234" Note: I know I can just match 1234 here, but the actual example is complicated enough that I have to
2012 Mar 30
1
How to use access results of gregexpr in data frames
Hello, I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column. I've used the following code successfully to find the first instance of "/". dframe <- data.frame(date=c("5/14/2011", "4/7/2011")) dframe$x1 <- regexpr("/", dframe[, 1])
2008 Oct 31
1
gregexpr slow and increases exponentially with string length --> how to speed it up?
Dear All, I have a long string and need to search for regular expressions in there. However it becomes horribly slow as the string length increases. Below is an example: when "i" increases by 5, the time spent increases by more! (my string is 11,000,000 letters long!) I also noticed that - the search time increases dramatically with the number of matches found. - the perl=T option
2008 Jan 31
1
segfault in gregexpr()
Hi, Tried with R 2.6 and R 2.7: > gregexpr("", "abc", fixed=TRUE) *** caught segfault *** address 0x1c09000, cause 'memory not mapped' Traceback: 1: gregexpr("", "abc", fixed = TRUE) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace
2006 Feb 01
1
Word boundaries and gregexpr in R 2.2.1
Hi I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings. text<-c("This is a first example sentence.", "And this is a second example sentence.") If I
2006 Feb 01
1
Word boundaries and gregexpr in R 2.2.1 (PR#8547)
Full_Name: Stefan Th. Gries Version: 2.2.1 OS: Windows XP (Home and Professional) Submission from: (NULL) (68.6.34.104) The problem is this: I have a vector of two character strings. > text<-c("This is a first example sentence.", "And this is a second example sentence.") If I now look for word boundaries with regexpr, this is what I get: >
2019 Aug 29
0
Feature request: non-dropping regmatches/strextract
if you want "to extract regex matches into a new column in a data.frame" then there are some package functions which do exactly that. three examples are namedCapture::df_match_variable, rematch2::bind_re_match, and tidyr::extract. For a more detailed discussion see my R journal submission (under review) about regular expression packages,
2020 Jun 09
2
valgrind false positive on R startup?
Hi all, I'm on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and using valgrind I am always seeing the following message. Does anybody else see that? Is that a known false positive? Any ideas how to fix/suppress? Seems related to TRE, do I need to upgrade that? (base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind -e 'extSoftVersion()' ==9565==
2017 Jan 06
0
strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings
While doing some speed testing I noticed that in R-3.2.3 the perl=TRUE variants of strsplit() and gregexpr() took time proportional to the square of the number of pattern matches in their input strings. E.g., the attached test function times gsub, strsplit, and gregexpr, with perl TRUE (PCRE) and FALSE (TRE), when the input string contains 'n' matches to the given pattern. Notice the