Dear all, please, is there any possibility how to extract a date from data which are like this: .... "Date: Sat, 21 Feb 04 10:25:43 GMT" "Date: 13 Feb 2004 13:54:22 -0600" "Date: Fri, 20 Feb 2004 17:00:48 +0000" "Date: Fri, 14 Jun 2002 16:22:27 -0400" "Date: Wed, 18 Feb 2004 08:53:56 -0500" "Date: 20 Feb 2004 02:18:58 -0600" "Date: Sun, 15 Feb 2004 16:01:19 +0800" .... I used strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), sep="-"), format="%d-%b-%Y") which suits to lines 3:5 and 7 (such are the most common in my dataset) but obviously does not work with other lines. If there is no stightforward solution I can live with what I use now but some automagical function like give.me.date.from.my.string.regardles.of.formating(x) would be great. Thank you. Petr Pikal petr.pikal at precheza.cz
On Tue, 5 Apr 2005, Petr Pikal wrote:> Dear all, > > please, is there any possibility how to extract a date from data > which are like this:Yes, if you delimit all the possibilities.> .... > "Date: Sat, 21 Feb 04 10:25:43 GMT" > "Date: 13 Feb 2004 13:54:22 -0600" > "Date: Fri, 20 Feb 2004 17:00:48 +0000" > "Date: Fri, 14 Jun 2002 16:22:27 -0400" > "Date: Wed, 18 Feb 2004 08:53:56 -0500" > "Date: 20 Feb 2004 02:18:58 -0600" > "Date: Sun, 15 Feb 2004 16:01:19 +0800" > .... > > I used > > strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), > sep="-"), format="%d-%b-%Y") > > which suits to lines 3:5 and 7 (such are the most common in my > dataset) but obviously does not work with other lines.For those examples, in character vector 'dates' (without quotes):> nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*","\\1 \\2 \\3", dates)> strptime(nd, "%d %b %y")[1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18" [6] "2020-02-20" "2020-02-15" You should be able to amend the regexp for a wider range of forms, but your first line is ambiguous (2004 or 2021?) so there are limits.> If there is no stightforward solution I can live with what I use now but > some automagical function like > > give.me.date.from.my.string.regardles.of.formating(x) > would be great.It would be impossible: when Americans write 07/04/2004 they do not mean April 7th. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Dear Prof.Ripley Thank you for your answer. After some tests and errors I finished with suitable extraction function which gives me substatnial increase in positive answers. Nevertheless I definitely need to gain more practice in regular expressions, but from the help page I can grasp only easy things. Is there any "Regular expressions for dummies" available? Best regards Petr Pikal On 5 Apr 2005 at 10:23, Prof Brian Ripley wrote:> On Tue, 5 Apr 2005, Petr Pikal wrote: > > > Dear all, > > > > please, is there any possibility how to extract a date from data > > which are like this: > > Yes, if you delimit all the possibilities. > > > .... > > "Date: Sat, 21 Feb 04 10:25:43 GMT" > > "Date: 13 Feb 2004 13:54:22 -0600" > > "Date: Fri, 20 Feb 2004 17:00:48 +0000" > > "Date: Fri, 14 Jun 2002 16:22:27 -0400" > > "Date: Wed, 18 Feb 2004 08:53:56 -0500" > > "Date: 20 Feb 2004 02:18:58 -0600" > > "Date: Sun, 15 Feb 2004 16:01:19 +0800" > > .... > > > > I used > > > > strptime(paste(substr(x,12,13), substr(x,15,17), substr(x,19,22), > > sep="-"), format="%d-%b-%Y") > > > > which suits to lines 3:5 and 7 (such are the most common in my > > dataset) but obviously does not work with other lines. > > For those examples, in character vector 'dates' (without quotes): > > > nd <- gsub("^[^0-9]*([0-9]+) ([A-Za-z]+) ([0-9]+).*", > "\\1 \\2 \\3", dates) > > strptime(nd, "%d %b %y") > [1] "2004-02-21" "2020-02-13" "2020-02-20" "2020-06-14" "2020-02-18" > [6] "2020-02-20" "2020-02-15" > > You should be able to amend the regexp for a wider range of forms, but > your first line is ambiguous (2004 or 2021?) so there are limits. > > > If there is no stightforward solution I can live with what I use now > > but some automagical function like > > > > give.me.date.from.my.string.regardles.of.formating(x) > > would be great. > > It would be impossible: when Americans write 07/04/2004 they do not > mean April 7th. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) 1 South > Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, > UK Fax: +44 1865 272595 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.htmlPetr Pikal petr.pikal at precheza.cz