On Wed, 10 Jun 2020, Jeff Newmiller wrote:> Fix your format specification? > ?strptime>> I have been trying to convert European short dates formatted as dd/mm/yy >> into the ISO 8601 but the function as.Dates interprets them as American >> ones (mm/dd/yy), thus I get:Look at Hadley Wickham's 'tidyverse' collection as described in R for Data Science. There are date, datetime, and time functions that will do just what you want. Rich
Martin Maechler
2020-Jun-11 07:17 UTC
[R] How to convert European short dates to ISO format?
>>>>> Rich Shepard >>>>> on Wed, 10 Jun 2020 07:44:49 -0700 writes:> On Wed, 10 Jun 2020, Jeff Newmiller wrote: >> Fix your format specification? ?strptime >>> I have been trying to convert European short dates >>> formatted as dd/mm/yy into the ISO 8601 but the function >>> as.Dates interprets them as American ones (mm/dd/yy), >>> thus I get: > Look at Hadley Wickham's 'tidyverse' collection as > described in R for Data Science. There are date, datetime, > and time functions that will do just what you want. > Rich I strongly disagree that automatic guessing of date format is a good idea: If you have dates such as 01/02/03, 10/11/12 , ... you cannot have a software (and also not a human) to *guess* for you what it means. You have to *know* or get that knowledge "exogenously", i.e., from context (say "meta data" if you want) that you as data analyst must have before you can reliably work with that data. There is a global standard (ISO) for dates, 2020-06-11, for today's; These have the huge advantage that alphabetical ordering is equivalent to time ordering ... and honestly I don't see why smart people (such as most? R users) do not all use these much more often, notably when it comes to data. But as long as most people in the world don't use that format and practically all default formats for dates (e.g. in spreadsheats and computer locales) do not use the ISO standard, but rather regional conventions, one must add meta data to have 100% garantee to use the correct format. Of course, you can often guess correctly with very high (subjective) probability, e.g., 11/23/99 is highly probably the 23rd of Nov, 1999.... and indeed if you have more than a few dates, it often helps to guess correctly. But there's no guarantee. No, I state that it is much better to ask from the data analyst to use their brains a little bit and enter the date format explicitly, than using software that does guess it for them correctly most of the time. How should they find out at all in the rare cases the automatic guess will be wrong ? Martin Maechler ETH Zurich and R Core team
Richard O'Keefe
2020-Jun-11 09:31 UTC
[R] How to convert European short dates to ISO format?
I would add to this that in an important data set I was working with, most of the dates were dd/mm/yy but some of them were mm/dd/yy and that led to the realisation that I couldn't *tell* for about 40% of the dates which they were. If they were all one or the other, no worries, but when you have people from mixed backgrounds writing in mixed formats, you have a problem. On Thu, 11 Jun 2020 at 19:17, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> >>>>> Rich Shepard > >>>>> on Wed, 10 Jun 2020 07:44:49 -0700 writes: > > > On Wed, 10 Jun 2020, Jeff Newmiller wrote: > >> Fix your format specification? ?strptime > > >>> I have been trying to convert European short dates > >>> formatted as dd/mm/yy into the ISO 8601 but the function > >>> as.Dates interprets them as American ones (mm/dd/yy), > >>> thus I get: > > > Look at Hadley Wickham's 'tidyverse' collection as > > described in R for Data Science. There are date, datetime, > > and time functions that will do just what you want. > > > Rich > > I strongly disagree that automatic guessing of date format is a > good idea: > > If you have dates such as 01/02/03, 10/11/12 , ... > you cannot have a software (and also not a human) to *guess* for > you what it means. You have to *know* or get that knowledge "exogenously", > i.e., from context (say "meta data" if you want) that you as > data analyst must have before you can reliably work with that > data. > > There is a global standard (ISO) for dates, 2020-06-11, for today's; > These have the huge advantage that alphabetical ordering is > equivalent to time ordering ... and honestly I don't see why > smart people (such as most? R users) do not all use these much > more often, notably when it comes to data. > > But as long as most people in the world don't use that format > and practically all default formats for dates (e.g. in > spreadsheats and computer locales) do not use the ISO > standard, but rather regional conventions, one must add meta > data to have 100% garantee to use the correct format. > > Of course, you can often guess correctly with very high > (subjective) probability, e.g., 11/23/99 is highly probably > the 23rd of Nov, 1999.... and indeed if you have more than a few > dates, it often helps to guess correctly. But there's no > guarantee. > > No, I state that it is much better to ask from the data analyst > to use their brains a little bit and enter the date format > explicitly, than using software that does guess it for them > correctly most of the time. How should they find out at all in > the rare cases the automatic guess will be wrong ? > > Martin Maechler > ETH Zurich and R Core team > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Thu, 11 Jun 2020, Martin Maechler wrote:> > Look at Hadley Wickham's 'tidyverse' collection as > > described in R for Data Science. There are date, datetime, > > and time functions that will do just what you want.> I strongly disagree that automatic guessing of date format is a > good idea:Martin, I think either you misunderstood what I wrote or I was not sufficiently explicit in my brief response. I did not mean to imply there was any automatic guessing involved. Specifying input and output formats is required. Reading Hadley's book I was impressed that one could specify the format of dates in the dataset and convert them all to the ISO-8601 format. Before learning this I'd use emacs regex to do the reformating I needed (or, sometimes, awk). Regards, Rich