thr3ads.net - similar to: "how to capture matching words in a string ?"

Displaying 20 results from an estimated 1000 matches similar to: "how to capture matching words in a string ?"

regular expression change in R version 2.3.0?

2006 May 06

regular expression change in R version 2.3.0?

The interpretation of regular expressions with repetition quantifiers in the 'gregexpr' function seems to have changed between R Version 2.2.0 and 2.3.0. The 'gsub' function, however, gives the same results in R Versions 2.2.0 and 2.3.0. Below is an example that demonstrates the version differences of the 'gregexpr' function. I am not sure whether this new behavior is

strsplit("dia ma", "\\b") splits characterwise

2010 Jul 08

strsplit("dia ma", "\\b") splits characterwise

\b is word boundary. But, unexpectedly, strsplit("dia ma", "\\b") splits character by character. > strsplit("dia ma", "\\b") [[1]] [1] "d" "i" "a" " " "m" "a" > strsplit("dia ma", "\\b", perl=TRUE) [[1]] [1] "d" "i" "a" " "

question regarding gregexpr and read.table

2011 Aug 17

question regarding gregexpr and read.table

Hi, I have a silly question regarding the usage of two commands: read.table and gregexpr： For read.table, if I read a matrix and set header = T, I found that all the dash ("-") becomes dots (".") A = read.table("Matrix.txt", sep = "\t", header = F) A[1,1] # "A-B-C-D". A = read.table("Matrix.txt", sep = "\t", header = T)

Gregexpr - extract results with lapply

2006 Nov 07

Gregexpr - extract results with lapply

Gregexpr - extract results with lapply Hello, I need to extract sequences of three upper case letters in a string. In other words, in this string: str <-c("ABC", "this WOUld be gOOD") The result I'm looking for is ABC WOU OOD. With gregexpr, I can get the position and length of the sequences gregexpr('[A-Z]{3}',str,perl=TRUE) [[1]] [1] 1

Using gregexpr with multiple search elements

2009 Feb 25

Using gregexpr with multiple search elements

Dear list, I am trying to use gregexpr to see if entries in a dataframe have either of two possible values for a string. here's an example text<-c("fat", "rat", "cat", "dog", "log", "fish") If I just wanted to find if any one of the elements in text match the pattern "at" I would do gregexpr("\\at", text)

regexp bug in very recent r-devel

2007 May 22

regexp bug in very recent r-devel

completion is semi-broken in today's r-devel, and the reason seems to be some regular expression changes: > sessionInfo() R version 2.6.0 Under development (unstable) (2007-05-22 r41673) i686-pc-linux-gnu locale: [...] attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7]

patch for gregexpr(perl=TRUE)

2019 Feb 19

patch for gregexpr(perl=TRUE)

Hi all, Several people have noticed that gregexpr is very slow for large subject strings when perl=TRUE is specified. - https://stackoverflow.com/questions/31216299/r-faster-gregexpr-for-very-large-strings - http://r.789695.n4.nabble.com/strsplit-perl-TRUE-gregexpr-perl-TRUE-very-slow-for-long-strings-td4727902.html - https://stat.ethz.ch/pipermail/r-help/2008-October/178451.html I figured out

gregexpr (PR#9965)

2007 Oct 10

gregexpr (PR#9965)

Full_Name: Peter Dolan Version: 2.5.1 OS: Windows Submission from: (NULL) (128.193.227.43) gregexpr does not find all matching substrings if the substrings overlap: > gregexpr("abab","ababab") [[1]] [1] 1 attr(,"match.length") [1] 4 It does work correctly in Version 2.3.1 under linux.

regex question

2009 Aug 04

regex question

Hi, I am getting stuck over an apparently simple problem in the use of regular expressions : To collect together the first letters of the words from the Perl motto, ?There is more than one way to do it? in the following form ? TIMTOWTDI. I tried the following code : ? ##### A regex problem with the Perl motto astr<-"There is more than one way to do it" b1<-grep("\\<",

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

Full_Name: Reid Thompson Version: 2.8.0 RC (2008-10-12 r46696) OS: darwin9.5.0 Submission from: (NULL) (129.98.107.177) the gregexpr() function does NOT return a complete list of global matches as it should. this occurs when a pattern matches two overlapping portions of a string, only the first match is returned. the following function call demonstrates this error (although this is not how I

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

How to use access results of gregexpr in data frames

2012 Mar 30

How to use access results of gregexpr in data frames

Hello, I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column. I've used the following code successfully to find the first instance of "/". dframe <- data.frame(date=c("5/14/2011", "4/7/2011")) dframe$x1 <- regexpr("/", dframe[, 1])

Finding non disjoint regular expressions

2008 May 05

Finding non disjoint regular expressions

Hello, Is there any way I can use the gregexpr functions (or a different function) in a manner that will also return overlapping (i.e. non disjoint) regular expressions? For instance, when running gregexpr("AAA","AAAAAA"), I get two matches, one at position 1 and one at position 4. I'd like to receive 4 matches at positions 1, 2, 3 and 4. Thanks, Schraga

segfault in gregexpr()

2008 Jan 31

segfault in gregexpr()

Hi, Tried with R 2.6 and R 2.7: > gregexpr("", "abc", fixed=TRUE) *** caught segfault *** address 0x1c09000, cause 'memory not mapped' Traceback: 1: gregexpr("", "abc", fixed = TRUE) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

Word boundaries and gregexpr in R 2.2.1

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1

Hi I have a question concerning how to match word boundaries which I bet has a very simple answer, but I haven't found it with trial and error nor by searching the help archives for the terms in the subject line. The problem is this: I have a vector of two character strings. text<-c("This is a first example sentence.", "And this is a second example sentence.") If I

gregexpr in R 2.3.0 != gregexpr in R 2.4.0

2006 Oct 07

gregexpr in R 2.3.0 != gregexpr in R 2.4.0

Hi all I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0. In R 2.3.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) [[1]] [1] 1 3 5 7 9 attr(,"match.length") [1] 5 5 5 5 5 ... while in R 2.4.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)

how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?

2009 Dec 20

how to count the total number of (INCLUDING overlapping) occurrences of a substring within a string?

Last one for you guys: The command: length(gregexpr('cus','hocus pocus')[[1]]) [1] 2 returns the number of times the substring 'cus' appears in 'hocus pocus' (which is two) It's returning the number of **disjoint** matches. So: length(gregexpr('aa','aaa')[[1]]) [1] 1 returns 1. **What I want to do:** I'm looking for a way to count

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

Full_Name: Stefan Th. Gries Version: 2.2.1 OS: Windows XP (Home and Professional) Submission from: (NULL) (68.6.34.104) The problem is this: I have a vector of two character strings. > text<-c("This is a first example sentence.", "And this is a second example sentence.") If I now look for word boundaries with regexpr, this is what I get: >

regexec() bug in R 3.4.0

2017 Jun 28

regexec() bug in R 3.4.0

Hi, In R 3.4.0, the "Pattern Matching and Replacement" documentation that describes regexec(), gregexpr(), etc. states that the "text" argument to regexec is a character vector, "or an object which can be coerced by as.character to a character vector": regexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

parsing - input buffer overflow

2008 Jun 13

parsing - input buffer overflow

Hi, I am trying to parse a large amount of text using gregexpr(). Unfortunately, I get an "input buffer overflow" message when I attempt that with too large an amount of text. The error messages occurs before the parsing. The problem is that I cannot assign the text to a variable (an object) if the text is too large. This problem has been mentioned before, which I found using the

similar to: how to capture matching words in a string ?