Displaying 20 results from an estimated 600 matches similar to: "patch for gregexpr(perl=TRUE)"
2019 Feb 20
2
Bug: time complexity of substring is quadratic as string size and number of substrings increases
Hi all, (and especially hi to Tomas Kalibera who accepted my patch sent
yesterday)
I believe that I have found another bug, this time in the substring
function. The use case that I am concerned with is when there is a single
(character scalar) text/subject, and many substrings to extract. For example
substring("AAAA", 1:4, 1:4)
or more generally,
N=1000
2019 Feb 22
1
Bug: time complexity of substring is quadratic as string size and number of substrings increases
On 2/20/19 7:55 PM, Toby Hocking wrote:
> Update: I have observed that stringi::stri_sub is linear time complexity,
> and it computes the same thing as base::substring. figure
> https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png
> source:
> https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R
>
> To me this is a
2019 Aug 15
4
Feature request: non-dropping regmatches/strextract
A very common use case for regmatches is to extract regex matches into a new column in a data.frame (or data.table, etc.) or otherwise use the extracted strings alongside the input. However, the default behavior is to drop empty matches, which results in mismatches in column length if reassignment is done without subsetting.
For consistency with other R functions and compatibility with this use
2019 Feb 20
0
Bug: time complexity of substring is quadratic as string size and number of substrings increases
Update: I have observed that stringi::stri_sub is linear time complexity,
and it computes the same thing as base::substring. figure
https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.png
source:
https://github.com/tdhock/namedCapture-article/blob/master/figure-substring-bug.R
To me this is a clear indication of a bug in substring, but again it would
be nice to have
2007 Oct 10
4
gregexpr (PR#9965)
Full_Name: Peter Dolan
Version: 2.5.1
OS: Windows
Submission from: (NULL) (128.193.227.43)
gregexpr does not find all matching substrings if the substrings overlap:
> gregexpr("abab","ababab")
[[1]]
[1] 1
attr(,"match.length")
[1] 4
It does work correctly in Version 2.3.1 under linux.
2011 Aug 17
2
question regarding gregexpr and read.table
Hi,
I have a silly question regarding the usage of two commands: read.table and
gregexpr:
For read.table, if I read a matrix and set header = T, I found that all the
dash ("-") becomes dots (".")
A = read.table("Matrix.txt", sep = "\t", header = F)
A[1,1]
# "A-B-C-D".
A = read.table("Matrix.txt", sep = "\t", header = T)
2009 Feb 25
1
Using gregexpr with multiple search elements
Dear list,
I am trying to use gregexpr to see if entries in a dataframe have
either of two possible values for a string.
here's an example
text<-c("fat", "rat", "cat", "dog", "log", "fish")
If I just wanted to find if any one of the elements in text match the
pattern "at" I would do
gregexpr("\\at", text)
2006 Nov 07
1
Gregexpr - extract results with lapply
Gregexpr - extract results with lapply
Hello,
I need to extract sequences of three upper case letters in a string. In
other words, in this string:
str <-c("ABC", "this WOUld be gOOD")
The result I'm looking for is ABC WOU OOD.
With gregexpr, I can get the position and length of the sequences
gregexpr('[A-Z]{3}',str,perl=TRUE)
[[1]]
[1] 1
2006 Oct 07
2
gregexpr in R 2.3.0 != gregexpr in R 2.4.0
Hi all
I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0.
In R 2.3.0, this is what happens:
> gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)
[[1]]
[1] 1 3 5 7 9
attr(,"match.length")
[1] 5 5 5 5 5
... while in R 2.4.0, this is what happens:
> gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)
2008 Dec 12
4
gregexpr - match overlap mishandled (PR#13391)
Full_Name: Reid Thompson
Version: 2.8.0 RC (2008-10-12 r46696)
OS: darwin9.5.0
Submission from: (NULL) (129.98.107.177)
the gregexpr() function does NOT return a complete list of global matches as it
should. this occurs when a pattern matches two overlapping portions of a
string, only the first match is returned.
the following function call demonstrates this error (although this is not how I
2008 Dec 12
4
gregexpr - match overlap mishandled (PR#13391)
Full_Name: Reid Thompson
Version: 2.8.0 RC (2008-10-12 r46696)
OS: darwin9.5.0
Submission from: (NULL) (129.98.107.177)
the gregexpr() function does NOT return a complete list of global matches as it
should. this occurs when a pattern matches two overlapping portions of a
string, only the first match is returned.
the following function call demonstrates this error (although this is not how I
2012 Nov 02
2
backreferences in gregexpr
Hi Folks,
I'm trying to extract just the backreferences from a regex.
> temp = "abcd1234abcd1234"
> regmatches(temp, gregexpr("(?:abcd)(1234)", temp))
[[1]]
[1] "abcd1234" "abcd1234"
What I would like is:
[1] "1234" "1234"
Note: I know I can just match 1234 here, but the actual example is
complicated enough that I have to
2012 Mar 30
1
How to use access results of gregexpr in data frames
Hello,
I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column.
I've used the following code successfully to find the first instance of "/".
dframe <- data.frame(date=c("5/14/2011", "4/7/2011"))
dframe$x1 <- regexpr("/", dframe[, 1])
2008 Oct 31
1
gregexpr slow and increases exponentially with string length --> how to speed it up?
Dear All,
I have a long string and need to search for regular expressions in
there. However it becomes horribly slow as the string length
increases.
Below is an example: when "i" increases by 5, the time spent increases
by more! (my string is 11,000,000 letters long!)
I also noticed that
- the search time increases dramatically with the number of matches found.
- the perl=T option
2008 Jan 31
1
segfault in gregexpr()
Hi,
Tried with R 2.6 and R 2.7:
> gregexpr("", "abc", fixed=TRUE)
*** caught segfault ***
address 0x1c09000, cause 'memory not mapped'
Traceback:
1: gregexpr("", "abc", fixed = TRUE)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
2006 Feb 01
1
Word boundaries and gregexpr in R 2.2.1
Hi
I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings.
text<-c("This is a first example sentence.", "And this is a second example sentence.")
If I
2006 Feb 01
1
Word boundaries and gregexpr in R 2.2.1 (PR#8547)
Full_Name: Stefan Th. Gries
Version: 2.2.1
OS: Windows XP (Home and Professional)
Submission from: (NULL) (68.6.34.104)
The problem is this: I have a vector of two character strings.
> text<-c("This is a first example sentence.", "And this is a second example
sentence.")
If I now look for word boundaries with regexpr, this is what I get:
>
2019 Aug 29
0
Feature request: non-dropping regmatches/strextract
if you want "to extract regex matches into a new column in a data.frame"
then there are some package functions which do exactly that. three examples
are namedCapture::df_match_variable, rematch2::bind_re_match, and
tidyr::extract. For a more detailed discussion see my R journal submission
(under review) about regular expression packages,
2020 Jun 09
2
valgrind false positive on R startup?
Hi all,
I'm on Ubuntu 18.04, running R-4.0.0 which I compiled from source, and
using valgrind I am always seeing the following message. Does anybody
else see that? Is that a known false positive? Any ideas how to
fix/suppress? Seems related to TRE, do I need to upgrade that?
(base) tdhock at maude-MacBookPro:~/R/binsegRcpp$ R --vanilla -d valgrind
-e 'extSoftVersion()'
==9565==
2017 Jan 06
0
strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings
While doing some speed testing I noticed that in R-3.2.3 the perl=TRUE
variants of strsplit() and gregexpr() took time proportional to the
square of the number of pattern matches in their input strings. E.g.,
the attached test function times gsub, strsplit, and gregexpr, with
perl TRUE (PCRE) and FALSE (TRE), when the input string contains 'n'
matches to the given pattern. Notice the