johannes rara
2010-Jan-02 15:08 UTC
[R] Regexp: extract first occurrence of date in string
I would like to extract first date from a string:> txt <- "first date is 05.12.2009. Second date is 06.12.2009." > txt[1] "first date is 05.12.2009. Second date is 06.12.2009." I tried:> sub("^.*?\\s(\\d{1,2}\\.\\d{1,2}\\.\\d{4})", "\\1", txt, extended=T, perl=T)[1] "05.12.2009. Second date is 06.12.2009.">How to modify this? -J
Gabor Grothendieck
2010-Jan-02 15:35 UTC
[R] Regexp: extract first occurrence of date in string
Try this which uses a slightly simpler regexp:> library(gsubfn) > strapply(txt, "(\\d{1,2}\\.\\d{1,2}\\.\\d{4}).*")[[1]][1] "05.12.2009" or we could convert it to Date class at the same time where we have assumed month.day.year:> strapply(txt, "(\\d{1,2}\\.\\d{1,2}\\.\\d{4}).*", ~ as.Date(x, "%m.%d.%Y"))[[1]][1] "2009-05-12" or this even simpler regexp extracting all the dates and then picking off the first:> strapply(txt, "\\d{1,2}\\.\\d{1,2}\\.\\d{4}")[[1]][1][1] "05.12.2009" On Sat, Jan 2, 2010 at 10:08 AM, johannes rara <johannesraja at gmail.com> wrote:> I would like to extract first date from a string: > >> txt <- "first date is 05.12.2009. Second date is 06.12.2009." >> txt > [1] "first date is 05.12.2009. Second date is 06.12.2009." > > I tried: > >> sub("^.*?\\s(\\d{1,2}\\.\\d{1,2}\\.\\d{4})", "\\1", txt, extended=T, perl=T) > [1] "05.12.2009. Second date is 06.12.2009." >> > > How to modify this? > > -J > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
johannes rara
2010-Jan-02 15:43 UTC
[R] Regexp: extract first occurrence of date in string
Thanks, is the same possible using basic gsub/sub/grep etc. functions? -J 2010/1/2 Gabor Grothendieck <ggrothendieck at gmail.com>:> Try this which uses a slightly simpler regexp: > >> library(gsubfn) >> strapply(txt, "(\\d{1,2}\\.\\d{1,2}\\.\\d{4}).*")[[1]] > [1] "05.12.2009" > > or we could convert it to Date class at the same time where we have > assumed month.day.year: > >> strapply(txt, "(\\d{1,2}\\.\\d{1,2}\\.\\d{4}).*", ~ as.Date(x, "%m.%d.%Y"))[[1]] > [1] "2009-05-12" > > or this even simpler regexp extracting all the dates and then picking > off the first: > >> strapply(txt, "\\d{1,2}\\.\\d{1,2}\\.\\d{4}")[[1]][1] > [1] "05.12.2009" > > On Sat, Jan 2, 2010 at 10:08 AM, johannes rara <johannesraja at gmail.com> wrote: >> I would like to extract first date from a string: >> >>> txt <- "first date is 05.12.2009. Second date is 06.12.2009." >>> txt >> [1] "first date is 05.12.2009. Second date is 06.12.2009." >> >> I tried: >> >>> sub("^.*?\\s(\\d{1,2}\\.\\d{1,2}\\.\\d{4})", "\\1", txt, extended=T, perl=T) >> [1] "05.12.2009. Second date is 06.12.2009." >>> >> >> How to modify this? >> >> -J >> >> ______________________________________________ >> R-help at r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >
Gabor Grothendieck
2010-Jan-02 15:47 UTC
[R] Regexp: extract first occurrence of date in string
Use regexpr to get the offset into the string and its length and then use substr to pick extract it. On Sat, Jan 2, 2010 at 10:43 AM, johannes rara <johannesraja at gmail.com> wrote:> Thanks, is the same possible using basic gsub/sub/grep etc. functions? > > -J > > 2010/1/2 Gabor Grothendieck <ggrothendieck at gmail.com>: >> Try this which uses a slightly simpler regexp: >> >>> library(gsubfn) >>> strapply(txt, "(\\d{1,2}\\.\\d{1,2}\\.\\d{4}).*")[[1]] >> [1] "05.12.2009" >> >> or we could convert it to Date class at the same time where we have >> assumed month.day.year: >> >>> strapply(txt, "(\\d{1,2}\\.\\d{1,2}\\.\\d{4}).*", ~ as.Date(x, "%m.%d.%Y"))[[1]] >> [1] "2009-05-12" >> >> or this even simpler regexp extracting all the dates and then picking >> off the first: >> >>> strapply(txt, "\\d{1,2}\\.\\d{1,2}\\.\\d{4}")[[1]][1] >> [1] "05.12.2009" >> >> On Sat, Jan 2, 2010 at 10:08 AM, johannes rara <johannesraja at gmail.com> wrote: >>> I would like to extract first date from a string: >>> >>>> txt <- "first date is 05.12.2009. Second date is 06.12.2009." >>>> txt >>> [1] "first date is 05.12.2009. Second date is 06.12.2009." >>> >>> I tried: >>> >>>> sub("^.*?\\s(\\d{1,2}\\.\\d{1,2}\\.\\d{4})", "\\1", txt, extended=T, perl=T) >>> [1] "05.12.2009. Second date is 06.12.2009." >>>> >>> >>> How to modify this? >>> >>> -J >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >