thr3ads.net - similar to: "regexp bug in very recent r-devel"

Displaying 20 results from an estimated 600 matches similar to: "regexp bug in very recent r-devel"

2011 Feb 25

Named capture in regexp

Dear R core developers, One feature from Python that I have been wanting in R is the ability to capture groups in regular expressions using names. Consider the following example in R. > notables <- c(" Ben Franklin and Jefferson Davis","\tMillard Fillmore") > name.rex <- "(?<first>[A-Z][a-z]+) (?<last>[A-Z][a-z]+)" > (parsed <-

regular expression change in R version 2.3.0?

2006 May 06

regular expression change in R version 2.3.0?

The interpretation of regular expressions with repetition quantifiers in the 'gregexpr' function seems to have changed between R Version 2.2.0 and 2.3.0. The 'gsub' function, however, gives the same results in R Versions 2.2.0 and 2.3.0. Below is an example that demonstrates the version differences of the 'gregexpr' function. I am not sure whether this new behavior is

strsplit("dia ma", "\\b") splits characterwise

2010 Jul 08

strsplit("dia ma", "\\b") splits characterwise

\b is word boundary. But, unexpectedly, strsplit("dia ma", "\\b") splits character by character. > strsplit("dia ma", "\\b") [[1]] [1] "d" "i" "a" " " "m" "a" > strsplit("dia ma", "\\b", perl=TRUE) [[1]] [1] "d" "i" "a" " "

Gregexpr - extract results with lapply

2006 Nov 07

Gregexpr - extract results with lapply

Gregexpr - extract results with lapply Hello, I need to extract sequences of three upper case letters in a string. In other words, in this string: str <-c("ABC", "this WOUld be gOOD") The result I'm looking for is ABC WOU OOD. With gregexpr, I can get the position and length of the sequences gregexpr('[A-Z]{3}',str,perl=TRUE) [[1]] [1] 1

Using gregexpr with multiple search elements

2009 Feb 25

Using gregexpr with multiple search elements

Dear list, I am trying to use gregexpr to see if entries in a dataframe have either of two possible values for a string. here's an example text<-c("fat", "rat", "cat", "dog", "log", "fish") If I just wanted to find if any one of the elements in text match the pattern "at" I would do gregexpr("\\at", text)

question regarding gregexpr and read.table

2011 Aug 17

question regarding gregexpr and read.table

Hi, I have a silly question regarding the usage of two commands: read.table and gregexpr： For read.table, if I read a matrix and set header = T, I found that all the dash ("-") becomes dots (".") A = read.table("Matrix.txt", sep = "\t", header = F) A[1,1] # "A-B-C-D". A = read.table("Matrix.txt", sep = "\t", header = T)

patch for gregexpr(perl=TRUE)

2019 Feb 19

patch for gregexpr(perl=TRUE)

Hi all, Several people have noticed that gregexpr is very slow for large subject strings when perl=TRUE is specified. - https://stackoverflow.com/questions/31216299/r-faster-gregexpr-for-very-large-strings - http://r.789695.n4.nabble.com/strsplit-perl-TRUE-gregexpr-perl-TRUE-very-slow-for-long-strings-td4727902.html - https://stat.ethz.ch/pipermail/r-help/2008-October/178451.html I figured out

strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings

2017 Jan 06

strsplit(perl=TRUE), gregexpr(perl=TRUE) very slow for long strings

While doing some speed testing I noticed that in R-3.2.3 the perl=TRUE variants of strsplit() and gregexpr() took time proportional to the square of the number of pattern matches in their input strings. E.g., the attached test function times gsub, strsplit, and gregexpr, with perl TRUE (PCRE) and FALSE (TRE), when the input string contains 'n' matches to the given pattern. Notice the

segfault in gregexpr()

2008 Jan 31

segfault in gregexpr()

Hi, Tried with R 2.6 and R 2.7: > gregexpr("", "abc", fixed=TRUE) *** caught segfault *** address 0x1c09000, cause 'memory not mapped' Traceback: 1: gregexpr("", "abc", fixed = TRUE) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

regexec() bug in R 3.4.0

2017 Jun 28

regexec() bug in R 3.4.0

Hi, In R 3.4.0, the "Pattern Matching and Replacement" documentation that describes regexec(), gregexpr(), etc. states that the "text" argument to regexec is a character vector, "or an object which can be coerced by as.character to a character vector": regexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

How to use access results of gregexpr in data frames

2012 Mar 30

How to use access results of gregexpr in data frames

Hello, I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column. I've used the following code successfully to find the first instance of "/". dframe <- data.frame(date=c("5/14/2011", "4/7/2011")) dframe$x1 <- regexpr("/", dframe[, 1])

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

Full_Name: Stefan Th. Gries Version: 2.2.1 OS: Windows XP (Home and Professional) Submission from: (NULL) (68.6.34.104) The problem is this: I have a vector of two character strings. > text<-c("This is a first example sentence.", "And this is a second example sentence.") If I now look for word boundaries with regexpr, this is what I get: >

Word boundaries and gregexpr in R 2.2.1

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1

Hi I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings. text<-c("This is a first example sentence.", "And this is a second example sentence.") If I

regex question

2009 Aug 04

regex question

Hi, I am getting stuck over an apparently simple problem in the use of regular expressions : To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI. I tried the following code : ? ##### A regex problem with the Perl motto astr<-"There is more than one way to do it" b1<-grep("\\<",

gregexpr (PR#9965)

2007 Oct 10

gregexpr (PR#9965)

Full_Name: Peter Dolan Version: 2.5.1 OS: Windows Submission from: (NULL) (128.193.227.43) gregexpr does not find all matching substrings if the substrings overlap: > gregexpr("abab","ababab") [[1]] [1] 1 attr(,"match.length") [1] 4 It does work correctly in Version 2.3.1 under linux.

Finding non disjoint regular expressions

2008 May 05

Finding non disjoint regular expressions

Hello, Is there any way I can use the gregexpr functions (or a different function) in a manner that will also return overlapping (i.e. non disjoint) regular expressions? For instance, when running gregexpr("AAA","AAAAAA"), I get two matches, one at position 1 and one at position 4. I'd like to receive 4 matches at positions 1, 2, 3 and 4. Thanks, Schraga

gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub')

2009 Mar 22

gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub')

there seems to be something wrong with r's regexing. consider the following example: gregexpr('a*|b', 'ab') # positions: 1 2 # lengths: 1 1 gsub('a*|b', '.', 'ab') # .. where the pattern matches any number of 'a's or one b, and replaces the match with a dot, globally. the answer is correct (assuming a dfa engine).

gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub') (PR#13617)

2009 Mar 22

gsub('(.).(.)(.)', '\\3\\2\\1', 'gsub') (PR#13617)

Full_Name: Wacek Kusnierczyk Version: 2.10.0 r48181 OS: Ubuntu 8.04 Linux 32bit Submission from: (NULL) (129.241.199.135) there seems to be something wrong with r's regexing. consider the following example: gregexpr('a*|b', 'ab') # positions: 1 2 # lengths: 1 1 gsub('a*|b', '.', 'ab') # .. where the pattern matches any number of

how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?

2009 Dec 20

how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?

Last one for you guys: The command: length(gregexpr('cus','hocus pocus')[[1]]) [1] 2 returns the number of times the substring 'cus' appears in 'hocus pocus' (which is two) It's returning the number of **disjoint** matches. So: length(gregexpr('aa','aaa')[[1]]) [1] 1 returns 1. **What I want to do:** I'm looking for a way to count

Exceptional slowness with read.csv

2024 Apr 10

Exceptional slowness with read.csv

That's basically what I did 1. Get text lines using readLines 2. use tryCatch to parse each line using read.csv(text=...) 3. in the catch, use?gregexpr to find any quotes not adjacent to a comma (gregexpr("[^,]\"[^,]",...) 4. escape any quotes found by adding a second quote (using str_sub from stringr) 6. parse the patched text using read.csv(text=...) 7. write out the parsed

similar to: regexp bug in very recent r-devel