Dear R-people! I am using R 1.8.0, under Windows XP. While using ISOdate() and strptime(), I noticed the following behaviour when "wrong" arguments (e.g., months>12) are given to these functions:> ISOdate(year=2003,month=2,day=20) #ok[1] "2003-02-20 13:00:00 Westeurop?ische Normalzeit"> ISOdate(year=2003,month=2,day=30) #wrong day, but returns a value[1] "2003-03-02 13:00:00 Westeurop?ische Normalzeit"> ISOdate(year=2003,month=2,day=35) #wrong day, and returns NA[1] NA> ISOdate(year=2003,month=2,day=40) #wrong day, but returns a value[1] "2003-02-04 01:12:00 Westeurop?ische Normalzeit"> ISOdate(year=2003,month=22,day=20) #wrong month, but returns a value[1] "2003-02-02 21:12:00 Westeurop?ische Normalzeit" And almost the same with strptime():> strptime("2003-02-20", format="%Y-%m-%d")[1] "2003-02-20"> strptime("2003-02-30", format="%Y-%m-%d")[1] "2003-03-02"> strptime("2003-02-35", format="%Y-%m-%d")[1] NA> strptime("2003-02-40", format="%Y-%m-%d")[1] "2003-02-04"> strptime("2003-22-20", format="%Y-%m-%d")[1] NA Is this considered to be a user error ("If you put garbage in, expect to get garbage out"), or would it be safer to generally return Nas, as in ISOdate(year=2003,month=2,day=35)? -Heinrich.
On Fri, 14 Nov 2003, RINNER Heinrich wrote:> Dear R-people! > > I am using R 1.8.0, under Windows XP. > While using ISOdate() and strptime(), I noticed the following behaviour when > "wrong" arguments (e.g., months>12) are given to these functions: > > > ISOdate(year=2003,month=2,day=20) #ok > [1] "2003-02-20 13:00:00 Westeurop?ische Normalzeit" > > ISOdate(year=2003,month=2,day=30) #wrong day, but returns a value > [1] "2003-03-02 13:00:00 Westeurop?ische Normalzeit" > > ISOdate(year=2003,month=2,day=35) #wrong day, and returns NA > [1] NA > > ISOdate(year=2003,month=2,day=40) #wrong day, but returns a value > [1] "2003-02-04 01:12:00 Westeurop?ische Normalzeit" > > ISOdate(year=2003,month=22,day=20) #wrong month, but returns a value > [1] "2003-02-02 21:12:00 Westeurop?ische Normalzeit" > > And almost the same with strptime(): > > strptime("2003-02-20", format="%Y-%m-%d") > [1] "2003-02-20" > > strptime("2003-02-30", format="%Y-%m-%d") > [1] "2003-03-02" > > strptime("2003-02-35", format="%Y-%m-%d") > [1] NA > > strptime("2003-02-40", format="%Y-%m-%d") > [1] "2003-02-04" > > strptime("2003-22-20", format="%Y-%m-%d") > [1] NA > > Is this considered to be a user error ("If you put garbage in, expect to get > garbage out"), or would it be safer to generally return Nas, as in > ISOdate(year=2003,month=2,day=35)?Expect to get the best guess at what you intended, and expect this to depend on your OS. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
People who don't like this behaviour (and particularly those who dislike it as much as I do), should consider as.date() from the dates package as an alternative. Gives you a NA if the specified date is impossible (at least in all the examples given earlier). Is the behaviour of ISOtime() and strptime() determined by ISO or POSIX standard? Seems not to fit R's "no nannying" policy at all. Or maybe it's the future: in version 1.9 will I be able to type glm() and have R take a best guess at the model specification I had in mind?> -----Original Message----- > From: Prof Brian Ripley [mailto:ripley at stats.ox.ac.uk]> Expect to get the best guess at what you intended, and expect this to > depend on your OS. > > -- > Brian D. Ripley, ripley at stats.ox.ac.uk > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
I think I do understand how difficult dates are. All I'm saying is that by adopting a "standard" that is OS dependent (and hence, almost by definition, OS varying) you make R behave differently on different OSs - and that is NOT "making R portable across multiple OSs". This is a theoretical whinge. I'm not going to program it ! Please don't let me make too much of this anyway. For one thing, although it is not guaranteed, it seems that many OSs DO in fact behave identically. Also, it is only incomplete or erroneous dates that might be handled differently - and in most applications, one needs to pre-process incomplete date-times in R, rather than leave them to any default interpretation (even if that default was strictly fixed).> -----Original Message----- > From: Jason Turner [mailto:jasont at indigoindustrial.co.nz] > Sent: 15 November 2003 06:17 > To: Simon Fear > Cc: r-help at stat.math.ethz.ch > Subject: Re: [R] ISOdate() and strptime() > > > Security Warning: > If you are not sure an attachment is safe to open please contact > Andy on x234. There are 0 attachments with this message. > ________________________________________________________________ > > Thomas Lumley wrote: > > > On Fri, 14 Nov 2003, Simon Fear wrote: > > > >>Is the behaviour of ISOtime() and strptime() determined by ISO > >>or POSIX standard? Seems not to fit R's "no nannying" policy > >>at all. > >> > > > > > > It's determined by your operating system, so you're > complaining to the > > wrong people. > > > > And since R is written to be portable across multiple OSs, > you might get > an idea how tricky this becomes. Hence the "iron fist" > approach to date > handling. Believe me, I've programmed date handling - it's always a > terrible, nasty, messy business when international locales > and different > operating systems clash. I'm stunned it's as good as it is, subtle > traps and all. > > Cheers > > Jason >Simon Fear Senior Statistician Syne qua non Ltd Tel: +44 (0) 1379 644449 Fax: +44 (0) 1379 644445 email: Simon.Fear at synequanon.com web: http://www.synequanon.com Number of attachments included with this message: 0 This message (and any associated files) is confidential and\...{{dropped}}
I have followed with interest the discussion on date handling. I am no expert in these things; all I want to do is convert a character vector that has been read into R (and which may contain some erroneous dates) to a "date format", and then do some work with it [e.g., use it in a plot]. Classes "POSIXlt" and "POSIXct" seem fine to me - for example, they have very nice and useful "seq" and "plot" methods. So now I have two more questions: 1. Is it only incomplete or erroneous dates that might be handled "differently" by ISOdate() or strptime()? Do correct specifications of year, month and day always give the same results, no matter where or who I am? 2. Can someone point me to a reference that helps me understand why R's (or the Operating systems?) "best guess at what I intended" turns out to be the results in the examples I posted in my earlier mail? Regards, Heinrich.> -----Urspr?ngliche Nachricht----- > Von: RINNER Heinrich [mailto:H.RINNER at tirol.gv.at] > Gesendet: Freitag, 14. November 2003 11:13 > An: 'r-help at stat.math.ethz.ch' > Betreff: [R] ISOdate() and strptime() > > > Dear R-people! > > I am using R 1.8.0, under Windows XP. > While using ISOdate() and strptime(), I noticed the following > behaviour when > "wrong" arguments (e.g., months>12) are given to these functions: > > > ISOdate(year=2003,month=2,day=20) #ok > [1] "2003-02-20 13:00:00 Westeurop?ische Normalzeit" > > ISOdate(year=2003,month=2,day=30) #wrong day, but returns a value > [1] "2003-03-02 13:00:00 Westeurop?ische Normalzeit" > > ISOdate(year=2003,month=2,day=35) #wrong day, and returns NA > [1] NA > > ISOdate(year=2003,month=2,day=40) #wrong day, but returns a value > [1] "2003-02-04 01:12:00 Westeurop?ische Normalzeit" > > ISOdate(year=2003,month=22,day=20) #wrong month, but returns a value > [1] "2003-02-02 21:12:00 Westeurop?ische Normalzeit" > > And almost the same with strptime(): > > strptime("2003-02-20", format="%Y-%m-%d") > [1] "2003-02-20" > > strptime("2003-02-30", format="%Y-%m-%d") > [1] "2003-03-02" > > strptime("2003-02-35", format="%Y-%m-%d") > [1] NA > > strptime("2003-02-40", format="%Y-%m-%d") > [1] "2003-02-04" > > strptime("2003-22-20", format="%Y-%m-%d") > [1] NA > > Is this considered to be a user error ("If you put garbage > in, expect to get > garbage out"), or would it be safer to generally return Nas, as in > ISOdate(year=2003,month=2,day=35)? > > -Heinrich. > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-help >
Confirmation that this *is* an OS-specific problem: A professional implementation of the POSIX standard (Solaris) gets all of these correct. Your so-called OS lacks any implementation of strptime, so we borrowed one from glibc. Unfortunately, that is buggy, even to the extent that unclass(strptime("2003-22-20", format="%Y-%m-%d")) unclass(strptime("2003 22 20", format="%Y %m %d")) give different answers! (And RH8.0 gives the same answers as the substitute code used on R for Windows.) I believe Simon Fear owes the R-developers a public apology for his (not properly referenced in the archives) reply to this thread. BDR On Fri, 14 Nov 2003, Prof Brian Ripley wrote:> On Fri, 14 Nov 2003, RINNER Heinrich wrote: > > > Dear R-people! > > > > I am using R 1.8.0, under Windows XP. > > While using ISOdate() and strptime(), I noticed the following behaviour when > > "wrong" arguments (e.g., months>12) are given to these functions: > > > > > ISOdate(year=2003,month=2,day=20) #ok > > [1] "2003-02-20 13:00:00 Westeurop?ische Normalzeit" > > > ISOdate(year=2003,month=2,day=30) #wrong day, but returns a value > > [1] "2003-03-02 13:00:00 Westeurop?ische Normalzeit" > > > ISOdate(year=2003,month=2,day=35) #wrong day, and returns NA > > [1] NA > > > ISOdate(year=2003,month=2,day=40) #wrong day, but returns a value > > [1] "2003-02-04 01:12:00 Westeurop?ische Normalzeit" > > > ISOdate(year=2003,month=22,day=20) #wrong month, but returns a value > > [1] "2003-02-02 21:12:00 Westeurop?ische Normalzeit" > > > > And almost the same with strptime(): > > > strptime("2003-02-20", format="%Y-%m-%d") > > [1] "2003-02-20" > > > strptime("2003-02-30", format="%Y-%m-%d") > > [1] "2003-03-02" > > > strptime("2003-02-35", format="%Y-%m-%d") > > [1] NA > > > strptime("2003-02-40", format="%Y-%m-%d") > > [1] "2003-02-04" > > > strptime("2003-22-20", format="%Y-%m-%d") > > [1] NA > > > > Is this considered to be a user error ("If you put garbage in, expect to get > > garbage out"), or would it be safer to generally return Nas, as in > > ISOdate(year=2003,month=2,day=35)? > > Expect to get the best guess at what you intended, and expect this to > depend on your OS. > >-- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595