Displaying 20 results from an estimated 600 matches similar to: "regexp bug in very recent r-devel"
2011 Feb 25
0
Named capture in regexp
Dear R core developers,
One feature from Python that I have been wanting in R is the ability
to capture groups in regular expressions using names. Consider the
following example in R.
> notables <- c(" Ben Franklin and Jefferson Davis","\tMillard Fillmore")
> name.rex <- "(?<first>[A-Z][a-z]+) (?<last>[A-Z][a-z]+)"
> (parsed <-
2006 May 06
2
regular expression change in R version 2.3.0?
The interpretation of regular expressions with repetition
quantifiers in the 'gregexpr' function seems to have changed
between R Version 2.2.0 and 2.3.0. The 'gsub' function, however,
gives the same results in R Versions 2.2.0 and 2.3.0. Below is
an example that demonstrates the version differences of the
'gregexpr' function. I am not sure whether this new behavior
is
2010 Jul 08
2
strsplit("dia ma", "\\b") splits characterwise
\b is word boundary.
But, unexpectedly, strsplit("dia ma", "\\b") splits character by character.
> strsplit("dia ma", "\\b")
[[1]]
[1] "d" "i" "a" " " "m" "a"
> strsplit("dia ma", "\\b", perl=TRUE)
[[1]]
[1] "d" "i" "a" " "
2006 Nov 07
1
Gregexpr - extract results with lapply
Gregexpr - extract results with lapply
Hello,
I need to extract sequences of three upper case letters in a string. In
other words, in this string:
str <-c("ABC", "this WOUld be gOOD")
The result I'm looking for is ABC WOU OOD.
With gregexpr, I can get the position and length of the sequences
gregexpr('[A-Z]{3}',str,perl=TRUE)
[[1]]
[1] 1
2009 Feb 25
1
Using gregexpr with multiple search elements
Dear list,
I am trying to use gregexpr to see if entries in a dataframe have
either of two possible values for a string.
here's an example
text<-c("fat", "rat", "cat", "dog", "log", "fish")
If I just wanted to find if any one of the elements in text match the
pattern "at" I would do
gregexpr("\\at", text)
2011 Aug 17
2
question regarding gregexpr and read.table
Hi,
I have a silly question regarding the usage of two commands: read.table and
gregexpr:
For read.table, if I read a matrix and set header = T, I found that all the
dash ("-") becomes dots (".")
A = read.table("Matrix.txt", sep = "\t", header = F)
A[1,1]
# "A-B-C-D".
A = read.table("Matrix.txt", sep = "\t", header = T)
2019 Feb 19
1
patch for gregexpr(perl=TRUE)
Hi all,
Several people have noticed that gregexpr is very slow for large subject
strings when perl=TRUE is specified.
-
https://stackoverflow.com/questions/31216299/r-faster-gregexpr-for-very-large-strings
-
http://r.789695.n4.nabble.com/strsplit-perl-TRUE-gregexpr-perl-TRUE-very-slow-for-long-strings-td4727902.html
- https://stat.ethz.ch/pipermail/r-help/2008-October/178451.html
I figured out
2017 Jan 06
0
strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings
While doing some speed testing I noticed that in R-3.2.3 the perl=TRUE
variants of strsplit() and gregexpr() took time proportional to the
square of the number of pattern matches in their input strings. E.g.,
the attached test function times gsub, strsplit, and gregexpr, with
perl TRUE (PCRE) and FALSE (TRE), when the input string contains 'n'
matches to the given pattern. Notice the
2008 Jan 31
1
segfault in gregexpr()
Hi,
Tried with R 2.6 and R 2.7:
> gregexpr("", "abc", fixed=TRUE)
*** caught segfault ***
address 0x1c09000, cause 'memory not mapped'
Traceback:
1: gregexpr("", "abc", fixed = TRUE)
Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
2017 Jun 28
1
regexec() bug in R 3.4.0
Hi,
In R 3.4.0, the "Pattern Matching and Replacement" documentation that describes regexec(), gregexpr(), etc. states that the "text" argument to regexec is a character vector, "or an object which can be coerced by as.character to a character vector":
regexec(pattern, text, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
2012 Mar 30
1
How to use access results of gregexpr in data frames
Hello,
I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column.
I've used the following code successfully to find the first instance of "/".
dframe <- data.frame(date=c("5/14/2011", "4/7/2011"))
dframe$x1 <- regexpr("/", dframe[, 1])
2006 Feb 01
1
Word boundaries and gregexpr in R 2.2.1 (PR#8547)
Full_Name: Stefan Th. Gries
Version: 2.2.1
OS: Windows XP (Home and Professional)
Submission from: (NULL) (68.6.34.104)
The problem is this: I have a vector of two character strings.
> text<-c("This is a first example sentence.", "And this is a second example
sentence.")
If I now look for word boundaries with regexpr, this is what I get:
>
2006 Feb 01
1
Word boundaries and gregexpr in R 2.2.1
Hi
I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings.
text<-c("This is a first example sentence.", "And this is a second example sentence.")
If I
2009 Aug 04
4
regex question
Hi,
I am getting stuck over an apparently simple problem in the use of regular expressions :
To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI.
I tried the following code :
?
##### A regex problem with the Perl motto
astr<-"There is more than one way to do it"
b1<-grep("\\<",
2007 Oct 10
4
gregexpr (PR#9965)
Full_Name: Peter Dolan
Version: 2.5.1
OS: Windows
Submission from: (NULL) (128.193.227.43)
gregexpr does not find all matching substrings if the substrings overlap:
> gregexpr("abab","ababab")
[[1]]
[1] 1
attr(,"match.length")
[1] 4
It does work correctly in Version 2.3.1 under linux.
2008 May 05
2
Finding non disjoint regular expressions
Hello,
Is there any way I can use the gregexpr functions (or a different function)
in a manner that will also return overlapping (i.e. non disjoint) regular
expressions?
For instance, when running gregexpr("AAA","AAAAAA"), I get two matches, one
at position 1 and one at position 4. I'd like to receive 4 matches at
positions 1, 2, 3 and 4.
Thanks,
Schraga
2009 Mar 22
0
gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub')
there seems to be something wrong with r's regexing. consider the
following example:
gregexpr('a*|b', 'ab')
# positions: 1 2
# lengths: 1 1
gsub('a*|b', '.', 'ab')
# ..
where the pattern matches any number of 'a's or one b, and replaces the
match with a dot, globally. the answer is correct (assuming a dfa
engine).
2009 Mar 22
0
gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub') (PR#13617)
Full_Name: Wacek Kusnierczyk
Version: 2.10.0 r48181
OS: Ubuntu 8.04 Linux 32bit
Submission from: (NULL) (129.241.199.135)
there seems to be something wrong with r's regexing. consider the following
example:
gregexpr('a*|b', 'ab')
# positions: 1 2
# lengths: 1 1
gsub('a*|b', '.', 'ab')
# ..
where the pattern matches any number of
how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?
2009 Dec 20
1
how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?
Last one for you guys:
The command:
length(gregexpr('cus','hocus pocus')[[1]])
[1] 2
returns the number of times the substring 'cus' appears in 'hocus pocus'
(which is two)
It's returning the number of **disjoint** matches. So:
length(gregexpr('aa','aaa')[[1]])
[1] 1
returns 1.
**What I want to do:**
I'm looking for a way to count
2024 Apr 10
1
Exceptional slowness with read.csv
That's basically what I did
1. Get text lines using readLines
2. use tryCatch to parse each line using read.csv(text=...)
3. in the catch, use?gregexpr to find any quotes not adjacent to a comma
(gregexpr("[^,]\"[^,]",...)
4. escape any quotes found by adding a second quote (using str_sub from
stringr)
6. parse the patched text using read.csv(text=...)
7. write out the parsed