thr3ads.net - similar to: "Word boundaries and gregexpr in R 2.2.1 (PR#8547)"

Displaying 20 results from an estimated 2000 matches similar to: "Word boundaries and gregexpr in R 2.2.1 (PR#8547)"

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1

Hi I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings. text<-c("This is a first example sentence.", "And this is a second example sentence.") If I

How to use access results of gregexpr in data frames

2012 Mar 30

How to use access results of gregexpr in data frames

Hello, I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column. I've used the following code successfully to find the first instance of "/". dframe <- data.frame(date=c("5/14/2011", "4/7/2011")) dframe$x1 <- regexpr("/", dframe[, 1])

gregexpr in R 2.3.0 != gregexpr in R 2.4.0

2006 Oct 07

gregexpr in R 2.3.0 != gregexpr in R 2.4.0

Hi all I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0. In R 2.3.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) [[1]] [1] 1 3 5 7 9 attr(,"match.length") [1] 5 5 5 5 5 ... while in R 2.4.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

Full_Name: Reid Thompson Version: 2.8.0 RC (2008-10-12 r46696) OS: darwin9.5.0 Submission from: (NULL) (129.98.107.177) the gregexpr() function does NOT return a complete list of global matches as it should. this occurs when a pattern matches two overlapping portions of a string, only the first match is returned. the following function call demonstrates this error (although this is not how I

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

regexp bug in very recent r-devel

2007 May 22

regexp bug in very recent r-devel

completion is semi-broken in today's r-devel, and the reason seems to be some regular expression changes: > sessionInfo() R version 2.6.0 Under development (unstable) (2007-05-22 r41673) i686-pc-linux-gnu locale: [...] attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7]

RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences

2006 Jul 23

RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences

Dear all I use R for Windows 2.3.1 on a fully updated Windows XP Home SP2 machine and I have two related regular expression problems. platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor

Gregexpr - extract results with lapply

2006 Nov 07

Gregexpr - extract results with lapply

Gregexpr - extract results with lapply Hello, I need to extract sequences of three upper case letters in a string. In other words, in this string: str <-c("ABC", "this WOUld be gOOD") The result I'm looking for is ABC WOU OOD. With gregexpr, I can get the position and length of the sequences gregexpr('[A-Z]{3}',str,perl=TRUE) [[1]] [1] 1

question regarding gregexpr and read.table

2011 Aug 17

question regarding gregexpr and read.table

Hi, I have a silly question regarding the usage of two commands: read.table and gregexpr： For read.table, if I read a matrix and set header = T, I found that all the dash ("-") becomes dots (".") A = read.table("Matrix.txt", sep = "\t", header = F) A[1,1] # "A-B-C-D". A = read.table("Matrix.txt", sep = "\t", header = T)

Using gregexpr with multiple search elements

2009 Feb 25

Using gregexpr with multiple search elements

Dear list, I am trying to use gregexpr to see if entries in a dataframe have either of two possible values for a string. here's an example text<-c("fat", "rat", "cat", "dog", "log", "fish") If I just wanted to find if any one of the elements in text match the pattern "at" I would do gregexpr("\\at", text)

patch for gregexpr(perl=TRUE)

2019 Feb 19

patch for gregexpr(perl=TRUE)

Hi all, Several people have noticed that gregexpr is very slow for large subject strings when perl=TRUE is specified. - https://stackoverflow.com/questions/31216299/r-faster-gregexpr-for-very-large-strings - http://r.789695.n4.nabble.com/strsplit-perl-TRUE-gregexpr-perl-TRUE-very-slow-for-long-strings-td4727902.html - https://stat.ethz.ch/pipermail/r-help/2008-October/178451.html I figured out

gregexpr (PR#9965)

2007 Oct 10

gregexpr (PR#9965)

Full_Name: Peter Dolan Version: 2.5.1 OS: Windows Submission from: (NULL) (128.193.227.43) gregexpr does not find all matching substrings if the substrings overlap: > gregexpr("abab","ababab") [[1]] [1] 1 attr(,"match.length") [1] 4 It does work correctly in Version 2.3.1 under linux.

regexec() bug in R 3.4.0

2017 Jun 28

regexec() bug in R 3.4.0

Hi, In R 3.4.0, the "Pattern Matching and Replacement" documentation that describes regexec(), gregexpr(), etc. states that the "text" argument to regexec is a character vector, "or an object which can be coerced by as.character to a character vector": regexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

segfault in gregexpr()

2008 Jan 31

segfault in gregexpr()

Hi, Tried with R 2.6 and R 2.7: > gregexpr("", "abc", fixed=TRUE) *** caught segfault *** address 0x1c09000, cause 'memory not mapped' Traceback: 1: gregexpr("", "abc", fixed = TRUE) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

backreferences in gregexpr

2012 Nov 02

backreferences in gregexpr

Hi Folks, I'm trying to extract just the backreferences from a regex. > temp = "abcd1234abcd1234" > regmatches(temp, gregexpr("(?:abcd)(1234)", temp)) [[1]] [1] "abcd1234" "abcd1234" What I would like is: [1] "1234" "1234" Note: I know I can just match 1234 here, but the actual example is complicated enough that I have to

Named capture in regexp

2011 Feb 25

Named capture in regexp

Dear R core developers, One feature from Python that I have been wanting in R is the ability to capture groups in regular expressions using names. Consider the following example in R. > notables <- c(" Ben Franklin and Jefferson Davis","\tMillard Fillmore") > name.rex <- "(?<first>[A-Z][a-z]+) (?<last>[A-Z][a-z]+)" > (parsed <-

gregexpr slow and increases exponentially with string length --> how to speed it up?

2008 Oct 31

gregexpr slow and increases exponentially with string length --> how to speed it up?

Dear All, I have a long string and need to search for regular expressions in there. However it becomes horribly slow as the string length increases. Below is an example: when "i" increases by 5, the time spent increases by more! (my string is 11,000,000 letters long!) I also noticed that - the search time increases dramatically with the number of matches found. - the perl=T option

pvclust crashing R on Ubuntu 10.10

2011 Mar 05

pvclust crashing R on Ubuntu 10.10

Hi all I am writing to you with a question regarding the pvclust package. And yes, before the usual people produce their usual contact-the-package-maintainers line, ye, I tried that but the emails one can find on the web either bounce or are not responded to. Also, yes, this error has already been reported as a bug but been shot down as not reproducible

(no subject)

2012 Sep 20

(no subject)

>From my book on corpus linguistics with R: # (10) Imagine you have two vectors a and b such that a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g") b<-c("a", "g", "d", "f", "g", "a", "f", "a",

Pattern match

2013 Mar 20

Pattern match

Hello again, in the help page of grep() function, it is written that pattern: character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Coerced by as.character to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are allowed

similar to: Word boundaries and gregexpr in R 2.2.1 (PR#8547)