thr3ads.net - R help - [R] Help with complicated regular expression [Nov 2009]

If this information is useful, please help other people find it:
Share via:

Dennis Fisher

2009-Nov-13 14:12 UTC

[R] Help with complicated regular expression

Colleagues,

I am using R (2.9.2, all platforms) to search for a complicated text  
string using regular expressions.  I would appreciate any help you can  
provide.
The string consists of the following elements:
	SOMEWORDWITHNOSPACES
	any number of spaces and/or tabs
	(
	any number of spaces and/or tabs
	integer
	any number of spaces and/or tabs
	)

Examples include:
	WORD (  123    )
	WORD(1 )
	WORD\t ( 21\t)
	WORD \t ( 1 \t   )
etc.

I don't need to substitute anything, only to identify if such a string  
exists.
Any help with regular expressions would be appreciated.
Thanks.

Dennis

		

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone: 1-866-PLessThan (1-866-753-7784)
Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

Romain Francois

2009-Nov-13 14:24 UTC

head link

[R] Help with complicated regular expression

Hello,

The function you are looking for is grepl. Something like this perhaps:

 > words <- c("WORD (  123    )","WORD(1)",
"WORD\t ( 21\t) ", "WORD\t (
21\t) " )
 > grepl( "[[:space:]]*[(][[:space:]]*[0-9]+[[:space:]]*[)]", words
)
[1] TRUE TRUE TRUE TRUE

[[:space:]]*     : any number of spaces or tabs (including 0 times)
[(]              : a (
[0-9]+           : any number of digits, but at least one
[)]              : a )

Romain


On 11/13/2009 03:12 PM, Dennis Fisher wrote:>
> Colleagues,
>
> I am using R (2.9.2, all platforms) to search for a complicated text
> string using regular expressions. I would appreciate any help you can
> provide.
> The string consists of the following elements:
> SOMEWORDWITHNOSPACES
> any number of spaces and/or tabs
> (
> any number of spaces and/or tabs
> integer
> any number of spaces and/or tabs
> )
>
> Examples include:
> WORD ( 123 )
> WORD(1 )
> WORD\t ( 21\t)
> WORD \t ( 1 \t )
> etc.
>
> I don't need to substitute anything, only to identify if such a string
> exists.
> Any help with regular expressions would be appreciated.
> Thanks.
>
> Dennis
>
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com

-- 
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/EAD5 : LondonR slides
|- http://tr.im/BcPw : celebrating R commit #50000
`- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc

Tony Plate

2009-Nov-13 14:27 UTC

head link

[R] Help with complicated regular expression

One of these should be a start.  If there can be no extra text at the beginning
or end, start with "^" and end with "$".
> x <- c("WORD (  123    )", "WORD(1 )", "WORD\t
( 21\t)", "WORD \t ( 1 \t   )", "decoy((2))",
"more words in front(2)")
> grep("[[:alpha:]]+[ \t]*\\([ \t]*[0-9]+[ \t]*\\)", x)
[1] 1 2 3 4 6> grep("^[[:alpha:]]+[ \t]*\\([ \t]*[0-9]+[ \t]*\\)", x)
[1] 1 2 3 4> 
-- Tony Plate

Dennis Fisher wrote:> Colleagues,
> 
> I am using R (2.9.2, all platforms) to search for a complicated text 
> string using regular expressions.  I would appreciate any help you can 
> provide.
> The string consists of the following elements:
>     SOMEWORDWITHNOSPACES
>     any number of spaces and/or tabs
>     (
>     any number of spaces and/or tabs
>     integer
>     any number of spaces and/or tabs
>     )
> 
> Examples include:
>     WORD (  123    )
>     WORD(1 )
>     WORD\t ( 21\t)
>     WORD \t ( 1 \t   )
> etc.
> 
> I don't need to substitute anything, only to identify if such a string 
> exists.
> Any help with regular expressions would be appreciated.
> Thanks.
> 
> Dennis
> 
>        
> 
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

jim holtman

2009-Nov-13 14:27 UTC

head link

[R] Help with complicated regular expression

try this:
> x <- c('WORD(12  )', 'WORD[123)', 'WORD   (   123  
)', "WORD(xx)", "WORD(1)")
>
grep("[[:alnum:]]+[[:space:]]*\\([[:space:]]*[[:digit:]]+[[:space:]]*\\)",
x)
[1] 1 3 5>

On Fri, Nov 13, 2009 at 9:12 AM, Dennis Fisher <fisher at plessthan.com>
wrote:> Colleagues,
>
> I am using R (2.9.2, all platforms) to search for a complicated text string
> using regular expressions. ?I would appreciate any help you can provide.
> The string consists of the following elements:
> ? ? ? ?SOMEWORDWITHNOSPACES
> ? ? ? ?any number of spaces and/or tabs
> ? ? ? ?(
> ? ? ? ?any number of spaces and/or tabs
> ? ? ? ?integer
> ? ? ? ?any number of spaces and/or tabs
> ? ? ? ?)
>
> Examples include:
> ? ? ? ?WORD ( ?123 ? ?)
> ? ? ? ?WORD(1 )
> ? ? ? ?WORD\t ( 21\t)
> ? ? ? ?WORD \t ( 1 \t ? )
> etc.
>
> I don't need to substitute anything, only to identify if such a string
> exists.
> Any help with regular expressions would be appreciated.
> Thanks.
>
> Dennis
>
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Marc Schwartz

2009-Nov-13 14:33 UTC

head link

[R] Help with complicated regular expression

On Nov 13, 2009, at 8:12 AM, Dennis Fisher wrote:
> Colleagues,
>
> I am using R (2.9.2, all platforms) to search for a complicated text  
> string using regular expressions.  I would appreciate any help you  
> can provide.
> The string consists of the following elements:
> 	SOMEWORDWITHNOSPACES
> 	any number of spaces and/or tabs
> 	(
> 	any number of spaces and/or tabs
> 	integer
> 	any number of spaces and/or tabs
> 	)
>
> Examples include:
> 	WORD (  123    )
> 	WORD(1 )
> 	WORD\t ( 21\t)
> 	WORD \t ( 1 \t   )
> etc.
>
> I don't need to substitute anything, only to identify if such a  
> string exists.
> Any help with regular expressions would be appreciated.
> Thanks.
>
> Dennis

How about this:

Lines <- c("WORD (  123    )","WORD(1)", "WORD\t (
21\t) ", "WORD\t
( 21\t) " )

 > Lines
[1] "WORD (  123    )" "WORD(1)"          "WORD\t (
21\t) "
[4] "WORD\t ( 21\t) "

 > grep("^[A-Za-z]+.*\\(.*[0-9]+.*\\)", Lines)
[1] 1 2 3 4

You should test it on some real data to see if it works or needs to be  
tweaked further.

^[A-Za-z]+ finds one or more characters at the beginning of the line
.* finds zero or more characters after the word
\\( finds an open paren
.* finds zero or more characters after the open paren
[0-9]+ finds one or more digits
.* finds zero or more characters after the digits
\\) finds the close paren


HTH,

Marc Schwartz

Gabor Grothendieck

2009-Nov-13 14:44 UTC

head link

[R] Help with complicated regular expression

\w+ will match one or more word characters and \s* will match 0 or
more spacing characters so if this must the described text must be the
complete expression then:

grepl("^\\w+\\s*\\(\\s*\\w+\\s*\\)$", x)

or if its ok for other text to appear before and after as long as the
indicated text is among it then remove the ^ and $.  The above gives a
logical vector as a result or if we use grep rather than grepl we can
get a vector of indexes.

On Fri, Nov 13, 2009 at 9:12 AM, Dennis Fisher <fisher at plessthan.com>
wrote:> Colleagues,
>
> I am using R (2.9.2, all platforms) to search for a complicated text string
> using regular expressions. ?I would appreciate any help you can provide.
> The string consists of the following elements:
> ? ? ? ?SOMEWORDWITHNOSPACES
> ? ? ? ?any number of spaces and/or tabs
> ? ? ? ?(
> ? ? ? ?any number of spaces and/or tabs
> ? ? ? ?integer
> ? ? ? ?any number of spaces and/or tabs
> ? ? ? ?)
>
> Examples include:
> ? ? ? ?WORD ( ?123 ? ?)
> ? ? ? ?WORD(1 )
> ? ? ? ?WORD\t ( 21\t)
> ? ? ? ?WORD \t ( 1 \t ? )
> etc.
>
> I don't need to substitute anything, only to identify if such a string
> exists.
> Any help with regular expressions would be appreciated.
> Thanks.
>
> Dennis
>
>
>
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone: 1-866-PLessThan (1-866-753-7784)
> Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Apparently Analagous Threads

Search for more seemingly similar threads

R help - Nov 2009 - Help with complicated regular expression

[R] Help with complicated regular expression

[R] Help with complicated regular expression

[R] Help with complicated regular expression

[R] Help with complicated regular expression

[R] Help with complicated regular expression

[R] Help with complicated regular expression

Apparently Analagous Threads