thr3ads.net - similar to: "patch for gregexpr(perl=TRUE)"

Displaying 20 results from an estimated 600 matches similar to: "patch for gregexpr(perl=TRUE)"

Bug: time complexity of substring is quadratic as string size and number of substrings increases

2019 Feb 20

Bug: time complexity of substring is quadratic as string size and number of substrings increases

Hi all, (and especially hi to Tomas Kalibera who accepted my patch sent yesterday) I believe that I have found another bug, this time in the substring function. The use case that I am concerned with is when there is a single (character scalar) text/subject, and many substrings to extract. For example substring("AAAA", 1:4, 1:4) or more generally, N=1000

Bug: time complexity of substring is quadratic as string size and number of substrings increases

2019 Feb 22

Bug: time complexity of substring is quadratic as string size and number of substrings increases

On 2/20/19 7:55 PM, Toby Hocking wrote: > Update: I have observed that stringi::stri_sub is linear time complexity, > and it computes the same thing as base::substring. figure > https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png > source: > https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R > > To me this is a

Feature request: non-dropping regmatches/strextract

2019 Aug 15

Feature request: non-dropping regmatches/strextract

A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting. For consistency with other R functions and compatibility with this use

Bug: time complexity of substring is quadratic as string size and number of substrings increases

2019 Feb 20

Bug: time complexity of substring is quadratic as string size and number of substrings increases

Update: I have observed that stringi::stri_sub is linear time complexity, and it computes the same thing as base::substring. figure https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png source: https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R To me this is a clear indication of a bug in substring, but again it would be nice to have

gregexpr (PR#9965)

2007 Oct 10

gregexpr (PR#9965)

Full_Name: Peter Dolan Version: 2.5.1 OS: Windows Submission from: (NULL) (128.193.227.43) gregexpr does not find all matching substrings if the substrings overlap: > gregexpr("abab","ababab") [[1]] [1] 1 attr(,"match.length") [1] 4 It does work correctly in Version 2.3.1 under linux.

question regarding gregexpr and read.table

2011 Aug 17

question regarding gregexpr and read.table

Hi, I have a silly question regarding the usage of two commands: read.table and gregexpr： For read.table, if I read a matrix and set header = T, I found that all the dash ("-") becomes dots (".") A = read.table("Matrix.txt", sep = "\t", header = F) A[1,1] # "A-B-C-D". A = read.table("Matrix.txt", sep = "\t", header = T)

Using gregexpr with multiple search elements

2009 Feb 25

Using gregexpr with multiple search elements

Dear list, I am trying to use gregexpr to see if entries in a dataframe have either of two possible values for a string. here's an example text<-c("fat", "rat", "cat", "dog", "log", "fish") If I just wanted to find if any one of the elements in text match the pattern "at" I would do gregexpr("\\at", text)

Gregexpr - extract results with lapply

2006 Nov 07

Gregexpr - extract results with lapply

Gregexpr - extract results with lapply Hello, I need to extract sequences of three upper case letters in a string. In other words, in this string: str <-c("ABC", "this WOUld be gOOD") The result I'm looking for is ABC WOU OOD. With gregexpr, I can get the position and length of the sequences gregexpr('[A-Z]{3}',str,perl=TRUE) [[1]] [1] 1

gregexpr in R 2.3.0 != gregexpr in R 2.4.0

2006 Oct 07

gregexpr in R 2.3.0 != gregexpr in R 2.4.0

Hi all I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0. In R 2.3.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) [[1]] [1] 1 3 5 7 9 attr(,"match.length") [1] 5 5 5 5 5 ... while in R 2.4.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

Full_Name: Reid Thompson Version: 2.8.0 RC (2008-10-12 r46696) OS: darwin9.5.0 Submission from: (NULL) (129.98.107.177) the gregexpr() function does NOT return a complete list of global matches as it should. this occurs when a pattern matches two overlapping portions of a string, only the first match is returned. the following function call demonstrates this error (although this is not how I

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

backreferences in gregexpr

2012 Nov 02

backreferences in gregexpr

Hi Folks, I'm trying to extract just the backreferences from a regex. > temp = "abcd1234abcd1234" > regmatches(temp, gregexpr("(?:abcd)(1234)", temp)) [[1]] [1] "abcd1234" "abcd1234" What I would like is: [1] "1234" "1234" Note: I know I can just match 1234 here, but the actual example is complicated enough that I have to

How to use access results of gregexpr in data frames

2012 Mar 30

How to use access results of gregexpr in data frames

Hello, I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column. I've used the following code successfully to find the first instance of "/". dframe <- data.frame(date=c("5/14/2011", "4/7/2011")) dframe$x1 <- regexpr("/", dframe[, 1])

gregexpr slow and increases exponentially with string length --> how to speed it up?

2008 Oct 31

gregexpr slow and increases exponentially with string length --> how to speed it up?

Dear All, I have a long string and need to search for regular expressions in there. However it becomes horribly slow as the string length increases. Below is an example: when "i" increases by 5, the time spent increases by more! (my string is 11,000,000 letters long!) I also noticed that - the search time increases dramatically with the number of matches found. - the perl=T option

segfault in gregexpr()

2008 Jan 31

segfault in gregexpr()

Hi, Tried with R 2.6 and R 2.7: > gregexpr("", "abc", fixed=TRUE) *** caught segfault *** address 0x1c09000, cause 'memory not mapped' Traceback: 1: gregexpr("", "abc", fixed = TRUE) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

Word boundaries and gregexpr in R 2.2.1

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1

Hi I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings. text<-c("This is a first example sentence.", "And this is a second example sentence.") If I

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

Full_Name: Stefan Th. Gries Version: 2.2.1 OS: Windows XP (Home and Professional) Submission from: (NULL) (68.6.34.104) The problem is this: I have a vector of two character strings. > text<-c("This is a first example sentence.", "And this is a second example sentence.") If I now look for word boundaries with regexpr, this is what I get: >

Feature request: non-dropping regmatches/strextract

2019 Aug 29

Feature request: non-dropping regmatches/strextract

if you want "to extract regex matches into a new column in a data.frame" then there are some package functions which do exactly that. three examples are namedCapture::df_match_variable, rematch2::bind_re_match, and tidyr::extract. For a more detailed discussion see my R journal submission (under review) about regular expression packages,

valgrind false positive on R startup?

2020 Jun 09

valgrind false positive on R startup?

Hi all, I'm on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and using valgrind I am always seeing the following message. Does anybody else see that? Is that a known false positive? Any ideas how to fix/suppress? Seems related to TRE, do I need to upgrade that? (base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind -e 'extSoftVersion()' ==9565==

strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings

2017 Jan 06

strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings

While doing some speed testing I noticed that in R-3.2.3 the perl=TRUE variants of strsplit() and gregexpr() took time proportional to the square of the number of pattern matches in their input strings. E.g., the attached test function times gsub, strsplit, and gregexpr, with perl TRUE (PCRE) and FALSE (TRE), when the input string contains 'n' matches to the given pattern. Notice the

similar to: patch for gregexpr(perl=TRUE)