thr3ads.net - R help - [R] extracting a matched string using regexpr [May 2010]

If this information is useful, please help other people find it:
Share via:

steven mosher

2010-May-05 21:13 UTC

[R] extracting a matched string using regexpr

Given a text like

I want to be able to extract a matched regular expression from a piece of
text.

this apparently works, but is pretty ugly
# some html
test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
# a pattern to extract 5 digits> pattern<-"[0-9]{5}"# regexpr returns a start point[1] and an attribute "match.length"
attr(,"match.length)
# get the substring from the start point to the stop point.. where stop start
+length-1>
answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)> answer[1] "88958"

I tried using sub(pattern, replacement, x )  with a regexp that captured the
group. I'd found an example of this in the mails
but it didnt seem to work..

	[[alternative HTML version deleted]]

Gabor Grothendieck

2010-May-05 21:35 UTC

head link

[R] extracting a matched string using regexpr

Here are two ways to extract 5 digits.

In the first one \\1 refers to the portion matched between the
parentheses in the regular expression.

In the second one strapply is like apply where the object to be worked
on is the first argument (array for apply, string for strapply) the
second modifies it (which dimension for apply, regular expression for
strapply) and the last is a function which acts on each value
(typically each row or column for apply and each match for strapply).
In this case we use c as our function to just return all the results.
They are returned in a list with one component per string but here
test is just a single string so we get a list one long and we ask for
the contents of the first component using [[1]].

# 1 - sub
sub(".*(\\d{5}).*", "\\1", test)

# 2 - strapply - see http://gsubfn.googlecode.com
library(gsubfn)
strapply(test, "\\d{5}", c)[[1]]

On Wed, May 5, 2010 at 5:13 PM, steven mosher <moshersteven at gmail.com>
wrote:> Given a text like
>
> I want to be able to extract a matched regular expression from a piece of
> text.
>
> this apparently works, but is pretty ugly
> # some html
>
test<-"</tr><tr><th>88958</th><th>Abcdsef</th><th>67.8S</th><th>68.9\nW</th><th>26m</th>"
> # a pattern to extract 5 digits
>> pattern<-"[0-9]{5}"
> # regexpr returns a start point[1] and an attribute
"match.length"
> attr(,"match.length)
> # get the substring from the start point to the stop point.. where stop
> start +length-1
>>
>
answer<-substr(test,regexpr(pattern,test)[1],regexpr(pattern,test)[1]+attr(regexpr(pattern,test),"match.length")-1)
>> answer
> [1] "88958"
>
> I tried using sub(pattern, replacement, x ) ?with a regexp that captured
the
> group. I'd found an example of this in the mails
> but it didnt seem to work..

Reasonably Related Threads

Search for more apparently analagous threads

R help - May 2010 - extracting a matched string using regexpr

[R] extracting a matched string using regexpr

[R] extracting a matched string using regexpr

Reasonably Related Threads