thr3ads.net - similar to: "Word boundaries and gregexpr in R 2.2.1"

Displaying 20 results from an estimated 2000 matches similar to: "Word boundaries and gregexpr in R 2.2.1"

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

2006 Feb 01

Word boundaries and gregexpr in R 2.2.1 (PR#8547)

Full_Name: Stefan Th. Gries Version: 2.2.1 OS: Windows XP (Home and Professional) Submission from: (NULL) (68.6.34.104) The problem is this: I have a vector of two character strings. > text<-c("This is a first example sentence.", "And this is a second example sentence.") If I now look for word boundaries with regexpr, this is what I get: >

gregexpr in R 2.3.0 != gregexpr in R 2.4.0

2006 Oct 07

gregexpr in R 2.3.0 != gregexpr in R 2.4.0

Hi all I have a question regarding differences in the way gregpexr works in R 2.3.0 and R 2.4.0. In R 2.3.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T) [[1]] [1] 1 3 5 7 9 attr(,"match.length") [1] 5 5 5 5 5 ... while in R 2.4.0, this is what happens: > gregexpr(" [a-z] [a-z] ", " a b c d e f ", perl=T)

parts of data frames: subset vs. [-c()]

2005 Aug 26

parts of data frames: subset vs. [-c()]

Dear all I have a problem with splitting up a data frame called ReVerb: ?? str(ReVerb) `data.frame': 92713 obs. of 16 variables: $ CHILD : Factor w/ 7 levels "ABE","ADA","EVE",..: 1 1 1 1 1 1 1 1 1 1 ... $ AGE : Factor w/ 484 levels "1;06.00","1;06.16",..: 43 43 43 99 99 99 99 99 99 99 ... $ AGE_Q : num 2.0 2.0 2.0 2.4 2.4

How to use access results of gregexpr in data frames

2012 Mar 30

How to use access results of gregexpr in data frames

Hello, I'm trying to figure out how to find the index of the second occurrence of "/" in a string (which happens to represent a date) within a data frame column. I've used the following code successfully to find the first instance of "/". dframe <- data.frame(date=c("5/14/2011", "4/7/2011")) dframe$x1 <- regexpr("/", dframe[, 1])

RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences

2006 Jul 23

RfW 2.3.1: regular expressions to detect pairs of identical word-final character sequences

Dear all I use R for Windows 2.3.1 on a fully updated Windows XP Home SP2 machine and I have two related regular expression problems. platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

Full_Name: Reid Thompson Version: 2.8.0 RC (2008-10-12 r46696) OS: darwin9.5.0 Submission from: (NULL) (129.98.107.177) the gregexpr() function does NOT return a complete list of global matches as it should. this occurs when a pattern matches two overlapping portions of a string, only the first match is returned. the following function call demonstrates this error (although this is not how I

gregexpr - match overlap mishandled (PR#13391)

2008 Dec 12

gregexpr - match overlap mishandled (PR#13391)

Gregexpr - extract results with lapply

2006 Nov 07

Gregexpr - extract results with lapply

Gregexpr - extract results with lapply Hello, I need to extract sequences of three upper case letters in a string. In other words, in this string: str <-c("ABC", "this WOUld be gOOD") The result I'm looking for is ABC WOU OOD. With gregexpr, I can get the position and length of the sequences gregexpr('[A-Z]{3}',str,perl=TRUE) [[1]] [1] 1

404 HTTP not found

2006 Sep 18

404 HTTP not found

Hi I wrote a script which retrieves links from websites and loads them with scan: ... website<-tolower(scan(current.pages[i], what="character", sep="\n", quiet=TRUE)) ... However occasionally, the script finds broken links, such as <http://www.google.com/test>. when the script tries to access such websites, the repeat loop breaks and I get the error message Error

question regarding gregexpr and read.table

2011 Aug 17

question regarding gregexpr and read.table

Hi, I have a silly question regarding the usage of two commands: read.table and gregexpr： For read.table, if I read a matrix and set header = T, I found that all the dash ("-") becomes dots (".") A = read.table("Matrix.txt", sep = "\t", header = F) A[1,1] # "A-B-C-D". A = read.table("Matrix.txt", sep = "\t", header = T)

Using gregexpr with multiple search elements

2009 Feb 25

Using gregexpr with multiple search elements

Dear list, I am trying to use gregexpr to see if entries in a dataframe have either of two possible values for a string. here's an example text<-c("fat", "rat", "cat", "dog", "log", "fish") If I just wanted to find if any one of the elements in text match the pattern "at" I would do gregexpr("\\at", text)

patch for gregexpr(perl=TRUE)

2019 Feb 19

patch for gregexpr(perl=TRUE)

Hi all, Several people have noticed that gregexpr is very slow for large subject strings when perl=TRUE is specified. - https://stackoverflow.com/questions/31216299/r-faster-gregexpr-for-very-large-strings - http://r.789695.n4.nabble.com/strsplit-perl-TRUE-gregexpr-perl-TRUE-very-slow-for-long-strings-td4727902.html - https://stat.ethz.ch/pipermail/r-help/2008-October/178451.html I figured out

gregexpr (PR#9965)

2007 Oct 10

gregexpr (PR#9965)

Full_Name: Peter Dolan Version: 2.5.1 OS: Windows Submission from: (NULL) (128.193.227.43) gregexpr does not find all matching substrings if the substrings overlap: > gregexpr("abab","ababab") [[1]] [1] 1 attr(,"match.length") [1] 4 It does work correctly in Version 2.3.1 under linux.

regexp bug in very recent r-devel

2007 May 22

regexp bug in very recent r-devel

completion is semi-broken in today's r-devel, and the reason seems to be some regular expression changes: > sessionInfo() R version 2.6.0 Under development (unstable) (2007-05-22 r41673) i686-pc-linux-gnu locale: [...] attached base packages: [1] "stats" "graphics" "grDevices" "utils" "datasets" "methods" [7]

segfault in gregexpr()

2008 Jan 31

segfault in gregexpr()

Hi, Tried with R 2.6 and R 2.7: > gregexpr("", "abc", fixed=TRUE) *** caught segfault *** address 0x1c09000, cause 'memory not mapped' Traceback: 1: gregexpr("", "abc", fixed = TRUE) Possible actions: 1: abort (with core dump, if enabled) 2: normal R exit 3: exit R without saving workspace 4: exit R saving workspace

backreferences in gregexpr

2012 Nov 02

backreferences in gregexpr

Hi Folks, I'm trying to extract just the backreferences from a regex. > temp = "abcd1234abcd1234" > regmatches(temp, gregexpr("(?:abcd)(1234)", temp)) [[1]] [1] "abcd1234" "abcd1234" What I would like is: [1] "1234" "1234" Note: I know I can just match 1234 here, but the actual example is complicated enough that I have to

gregexpr slow and increases exponentially with string length --> how to speed it up?

2008 Oct 31

gregexpr slow and increases exponentially with string length --> how to speed it up?

Dear All, I have a long string and need to search for regular expressions in there. However it becomes horribly slow as the string length increases. Below is an example: when "i" increases by 5, the time spent increases by more! (my string is 11,000,000 letters long!) I also noticed that - the search time increases dramatically with the number of matches found. - the perl=T option

Named backreferences in replacement patterns

2007 Mar 08

Named backreferences in replacement patterns

Hi I have a problem with substitutions involving named backreferences. I have a vector American.dates: > American.dates [1] "5/15/1976" "2.15.1970" "1.9.2006" which I want to change into British.dates: > British.dates [1] "15/5/1976" "15/2/1970" "9/1/2006" I know I can do it like this:

regexec() bug in R 3.4.0

2017 Jun 28

regexec() bug in R 3.4.0

Hi, In R 3.4.0, the "Pattern Matching and Replacement" documentation that describes regexec(), gregexpr(), etc. states that the "text" argument to regexec is a character vector, "or an object which can be coerced by as.character to a character vector": regexec(pattern, text, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

Pattern match

2013 Mar 20

Pattern match

Hello again, in the help page of grep() function, it is written that pattern: character string containing a regular expression (or character string for fixed = TRUE) to be matched in the given character vector. Coerced by as.character to a character string if possible. If a character vector of length 2 or more is supplied, the first element is used with a warning. Missing values are allowed

similar to: Word boundaries and gregexpr in R 2.2.1