Colleagues, I am using R (2.9.2, all platforms) to search for a complicated text string using regular expressions. I would appreciate any help you can provide. The string consists of the following elements: SOMEWORDWITHNOSPACES any number of spaces and/or tabs ( any number of spaces and/or tabs integer any number of spaces and/or tabs ) Examples include: WORD ( 123 ) WORD(1 ) WORD\t ( 21\t) WORD \t ( 1 \t ) etc. I don't need to substitute anything, only to identify if such a string exists. Any help with regular expressions would be appreciated. Thanks. Dennis Dennis Fisher MD P < (The "P Less Than" Company) Phone: 1-866-PLessThan (1-866-753-7784) Fax: 1-866-PLessThan (1-866-753-7784) www.PLessThan.com
Hello, The function you are looking for is grepl. Something like this perhaps: > words <- c("WORD ( 123 )","WORD(1)", "WORD\t ( 21\t) ", "WORD\t ( 21\t) " ) > grepl( "[[:space:]]*[(][[:space:]]*[0-9]+[[:space:]]*[)]", words ) [1] TRUE TRUE TRUE TRUE [[:space:]]* : any number of spaces or tabs (including 0 times) [(] : a ( [0-9]+ : any number of digits, but at least one [)] : a ) Romain On 11/13/2009 03:12 PM, Dennis Fisher wrote:> > Colleagues, > > I am using R (2.9.2, all platforms) to search for a complicated text > string using regular expressions. I would appreciate any help you can > provide. > The string consists of the following elements: > SOMEWORDWITHNOSPACES > any number of spaces and/or tabs > ( > any number of spaces and/or tabs > integer > any number of spaces and/or tabs > ) > > Examples include: > WORD ( 123 ) > WORD(1 ) > WORD\t ( 21\t) > WORD \t ( 1 \t ) > etc. > > I don't need to substitute anything, only to identify if such a string > exists. > Any help with regular expressions would be appreciated. > Thanks. > > Dennis > > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com-- Romain Francois Professional R Enthusiast +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr |- http://tr.im/EAD5 : LondonR slides |- http://tr.im/BcPw : celebrating R commit #50000 `- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc
One of these should be a start. If there can be no extra text at the beginning or end, start with "^" and end with "$".> x <- c("WORD ( 123 )", "WORD(1 )", "WORD\t ( 21\t)", "WORD \t ( 1 \t )", "decoy((2))", "more words in front(2)") > grep("[[:alpha:]]+[ \t]*\\([ \t]*[0-9]+[ \t]*\\)", x)[1] 1 2 3 4 6> grep("^[[:alpha:]]+[ \t]*\\([ \t]*[0-9]+[ \t]*\\)", x)[1] 1 2 3 4>-- Tony Plate Dennis Fisher wrote:> Colleagues, > > I am using R (2.9.2, all platforms) to search for a complicated text > string using regular expressions. I would appreciate any help you can > provide. > The string consists of the following elements: > SOMEWORDWITHNOSPACES > any number of spaces and/or tabs > ( > any number of spaces and/or tabs > integer > any number of spaces and/or tabs > ) > > Examples include: > WORD ( 123 ) > WORD(1 ) > WORD\t ( 21\t) > WORD \t ( 1 \t ) > etc. > > I don't need to substitute anything, only to identify if such a string > exists. > Any help with regular expressions would be appreciated. > Thanks. > > Dennis > > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
try this:> x <- c('WORD(12 )', 'WORD[123)', 'WORD ( 123 )', "WORD(xx)", "WORD(1)") > grep("[[:alnum:]]+[[:space:]]*\\([[:space:]]*[[:digit:]]+[[:space:]]*\\)", x)[1] 1 3 5>On Fri, Nov 13, 2009 at 9:12 AM, Dennis Fisher <fisher at plessthan.com> wrote:> Colleagues, > > I am using R (2.9.2, all platforms) to search for a complicated text string > using regular expressions. ?I would appreciate any help you can provide. > The string consists of the following elements: > ? ? ? ?SOMEWORDWITHNOSPACES > ? ? ? ?any number of spaces and/or tabs > ? ? ? ?( > ? ? ? ?any number of spaces and/or tabs > ? ? ? ?integer > ? ? ? ?any number of spaces and/or tabs > ? ? ? ?) > > Examples include: > ? ? ? ?WORD ( ?123 ? ?) > ? ? ? ?WORD(1 ) > ? ? ? ?WORD\t ( 21\t) > ? ? ? ?WORD \t ( 1 \t ? ) > etc. > > I don't need to substitute anything, only to identify if such a string > exists. > Any help with regular expressions would be appreciated. > Thanks. > > Dennis > > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
On Nov 13, 2009, at 8:12 AM, Dennis Fisher wrote:> Colleagues, > > I am using R (2.9.2, all platforms) to search for a complicated text > string using regular expressions. I would appreciate any help you > can provide. > The string consists of the following elements: > SOMEWORDWITHNOSPACES > any number of spaces and/or tabs > ( > any number of spaces and/or tabs > integer > any number of spaces and/or tabs > ) > > Examples include: > WORD ( 123 ) > WORD(1 ) > WORD\t ( 21\t) > WORD \t ( 1 \t ) > etc. > > I don't need to substitute anything, only to identify if such a > string exists. > Any help with regular expressions would be appreciated. > Thanks. > > DennisHow about this: Lines <- c("WORD ( 123 )","WORD(1)", "WORD\t ( 21\t) ", "WORD\t ( 21\t) " ) > Lines [1] "WORD ( 123 )" "WORD(1)" "WORD\t ( 21\t) " [4] "WORD\t ( 21\t) " > grep("^[A-Za-z]+.*\\(.*[0-9]+.*\\)", Lines) [1] 1 2 3 4 You should test it on some real data to see if it works or needs to be tweaked further. ^[A-Za-z]+ finds one or more characters at the beginning of the line .* finds zero or more characters after the word \\( finds an open paren .* finds zero or more characters after the open paren [0-9]+ finds one or more digits .* finds zero or more characters after the digits \\) finds the close paren HTH, Marc Schwartz
\w+ will match one or more word characters and \s* will match 0 or more spacing characters so if this must the described text must be the complete expression then: grepl("^\\w+\\s*\\(\\s*\\w+\\s*\\)$", x) or if its ok for other text to appear before and after as long as the indicated text is among it then remove the ^ and $. The above gives a logical vector as a result or if we use grep rather than grepl we can get a vector of indexes. On Fri, Nov 13, 2009 at 9:12 AM, Dennis Fisher <fisher at plessthan.com> wrote:> Colleagues, > > I am using R (2.9.2, all platforms) to search for a complicated text string > using regular expressions. ?I would appreciate any help you can provide. > The string consists of the following elements: > ? ? ? ?SOMEWORDWITHNOSPACES > ? ? ? ?any number of spaces and/or tabs > ? ? ? ?( > ? ? ? ?any number of spaces and/or tabs > ? ? ? ?integer > ? ? ? ?any number of spaces and/or tabs > ? ? ? ?) > > Examples include: > ? ? ? ?WORD ( ?123 ? ?) > ? ? ? ?WORD(1 ) > ? ? ? ?WORD\t ( 21\t) > ? ? ? ?WORD \t ( 1 \t ? ) > etc. > > I don't need to substitute anything, only to identify if such a string > exists. > Any help with regular expressions would be appreciated. > Thanks. > > Dennis > > > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone: 1-866-PLessThan (1-866-753-7784) > Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >