Hello, Newbie question: how do you capture groups in a regexp in R? Let's say I have txt="blah blah start=20080101 end=20090224". I'd like to get the two dates start and end. In Perl, one would say: my ($start,$end) = ($txt =~ /start=(\d{8}).*end=(\d{8})/); I've tried: txt <- "blah blah start=20080101 end=20090224" m <- regexpr("start=(\\d{8}).*end=(\\d{8})", filename, perl=T); dates = substring(filename, m, m+attr(m,"match.length")-1); but I get the whole matching substring... Any idea? ~Pierre
I don't know if there is a direct, perl-like way to capture the matches, but here is a solution:> mdat <- gregexpr("[[:digit:]]{8}", txt) > dates <- mapply(function(x, y) substr(txt, x, x + y - 1), mdat[[1]],attr(mdat[[1]], "match.length"))> dates[1] "20080101" "20090224" -Christos> -----Original Message----- > From: r-help-bounces at r-project.org > [mailto:r-help-bounces at r-project.org] On Behalf Of > pierre at demartines.com > Sent: Tuesday, February 24, 2009 7:23 PM > To: r-help at r-project.org > Subject: [R] regexp capturing group in R > > Hello, > > Newbie question: how do you capture groups in a regexp in R? > > Let's say I have txt="blah blah start=20080101 end=20090224". > I'd like to get the two dates start and end. > > In Perl, one would say: > > my ($start,$end) = ($txt =~ /start=(\d{8}).*end=(\d{8})/); > > I've tried: > > txt <- "blah blah start=20080101 end=20090224" > m <- regexpr("start=(\\d{8}).*end=(\\d{8})", filename, > perl=T); dates = substring(filename, m, m+attr(m,"match.length")-1); > > but I get the whole matching substring... > > Any idea? > > ~Pierre > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
> txt <- "blah blah start=20080101 end=20090224" > nums <- sub(".*start=(\\d+).*end=(\\d+).*", "\\1 \\2", txt, perl=TRUE)> nums <- strsplit(sub(".*start=(\\d+).*end=(\\d+).*", "\\1 \\2", txt, perl=TRUE), ' ') > nums[[1]] [1] "20080101" "20090224" On Tue, Feb 24, 2009 at 7:23 PM, <pierre at demartines.com> wrote:> Hello, > > Newbie question: how do you capture groups in a regexp in R? > > Let's say I have txt="blah blah start=20080101 end=20090224". > I'd like to get the two dates start and end. > > In Perl, one would say: > > my ($start,$end) = ($txt =~ /start=(\d{8}).*end=(\d{8})/); > > I've tried: > > txt <- "blah blah start=20080101 end=20090224" > m <- regexpr("start=(\\d{8}).*end=(\\d{8})", filename, perl=T); > dates = substring(filename, m, m+attr(m,"match.length")-1); > > but I get the whole matching substring... > > Any idea? > > ~Pierre > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
Try this: library(gsubfn) strapply("blah blah start=20080101 end=20090224", "start=(\\d{8}) end=(\\d{8})", c, perl = TRUE)[[1]] or perhaps just: strapply("blah blah start=20080101 end=20090224", "\\d{8}", perl = TRUE)[[1]] On Tue, Feb 24, 2009 at 7:23 PM, <pierre at demartines.com> wrote:> Hello, > > Newbie question: how do you capture groups in a regexp in R? > > Let's say I have txt="blah blah start=20080101 end=20090224". > I'd like to get the two dates start and end. > > In Perl, one would say: > > my ($start,$end) = ($txt =~ /start=(\d{8}).*end=(\d{8})/); > > I've tried: > > txt <- "blah blah start=20080101 end=20090224" > m <- regexpr("start=(\\d{8}).*end=(\\d{8})", filename, perl=T); > dates = substring(filename, m, m+attr(m,"match.length")-1); > > but I get the whole matching substring... > > Any idea? > > ~Pierre > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >