Mark Wardle
2007-Jan-07 12:01 UTC
[R] as.Date() results depend on order of data within vector?
Dear all, The as.Date() function appears to give different results depending on the order of the vector passed into it. d1 = c("1900-01-01", "2007-01-01","","2001-05-03") d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") as.Date(d1) # gives correct results as.Date(d2) # fails with error (* see below) This problem does not arise if the dates are NA rather than an empty string, but my data is coming via RODBC and I still don't have NAs passed across properly. I might add that I initially noticed this behaviour when using RODBC's sqlQuery() function call, and I initially had difficulty explaining why one column of dates was passed correctly, but another failed. The failing column was a "date of death" column where it was NA ("") for most patients. I've come up with two workarounds that work. The first is to sort the data at the SQL level, ensuring the initial record is not null. The second is to use sqlQuery() with as.is=T option, and then do the sorting and conversion afterwards. Is the behaviour of as.Date() shown above as expected/designed? Many thanks, Mark (*) "Error in fromchar(x) : character string is not in a standard unambiguous format" sessionInfo(): R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale: C/en_GB.UTF-8/C/C/C/C attached base packages: [1] "methods" "stats" "graphics" "grDevices" "utils" "datasets" "base" other attached packages: rcompletion RODBC "0.0-12" "1.1-7"
Gavin Simpson
2007-Jan-07 12:35 UTC
[R] as.Date() results depend on order of data within vector?
On Sun, 2007-01-07 at 12:01 +0000, Mark Wardle wrote:> Dear all, > > The as.Date() function appears to give different results depending on > the order of the vector passed into it. > > d1 = c("1900-01-01", "2007-01-01","","2001-05-03") > d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") > as.Date(d1) # gives correct results > as.Date(d2) # fails with error (* see below) > > This problem does not arise if the dates are NA rather than an empty > string, but my data is coming via RODBC and I still don't have NAs > passed across properly. > > I might add that I initially noticed this behaviour when using RODBC's > sqlQuery() function call, and I initially had difficulty explaining why > one column of dates was passed correctly, but another failed. The > failing column was a "date of death" column where it was NA ("") for > most patients. > > I've come up with two workarounds that work. The first is to sort the > data at the SQL level, ensuring the initial record is not null. The > second is to use sqlQuery() with as.is=T option, and then do the sorting > and conversion afterwards.Why not just tell R what the format the dates are in, using the "format" argument to as.Date?> d1 = c("1900-01-01", "2007-01-01","","2001-05-03") > d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") > as.Date(d1, "%Y-%m-%d")[1] "1900-01-01" "2007-01-01" NA "2001-05-03"> as.Date(d2, "%Y-%m-%d")[1] NA "1900-01-01" "2007-01-01" "2001-05-03"> > Is the behaviour of as.Date() shown above as expected/designed?I don't know about expected/designed, but I would have thought explicitly stating the date format would be the most fool-proof way of making sure R did what you wanted, and the easiest way to work around your "problem". HTH G> > Many thanks, > > Mark > > > (*) "Error in fromchar(x) : character string is not in a standard > unambiguous format" > > sessionInfo(): > R version 2.4.0 (2006-10-03) powerpc-apple-darwin8.7.0 locale: > C/en_GB.UTF-8/C/C/C/C > attached base packages: > [1] "methods" "stats" "graphics" "grDevices" "utils" > "datasets" "base" > > other attached packages: > rcompletion RODBC > "0.0-12" "1.1-7" > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC [f] +44 (0)20 7679 0565 UCL Department of Geography Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street London, UK [w] http://www.ucl.ac.uk/~ucfagls/ WC1E 6BT [w] http://www.freshwaters.org.uk/ %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Prof Brian Ripley
2007-Jan-07 12:36 UTC
[R] as.Date() results depend on order of data within vector?
On Sun, 7 Jan 2007, Mark Wardle wrote:> Dear all, > > The as.Date() function appears to give different results depending on > the order of the vector passed into it. > > d1 = c("1900-01-01", "2007-01-01","","2001-05-03") > d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") > as.Date(d1) # gives correct results > as.Date(d2) # fails with error (* see below) > > This problem does not arise if the dates are NA rather than an empty > string, but my data is coming via RODBC and I still don't have NAs > passed across properly. > > I might add that I initially noticed this behaviour when using RODBC's > sqlQuery() function call, and I initially had difficulty explaining why > one column of dates was passed correctly, but another failed. The > failing column was a "date of death" column where it was NA ("") for > most patients. > > I've come up with two workarounds that work. The first is to sort the > data at the SQL level, ensuring the initial record is not null. The > second is to use sqlQuery() with as.is=T option, and then do the sorting > and conversion afterwards. > > Is the behaviour of as.Date() shown above as expected/designed?Yes. It uses the first non-NA string to choose the format *if you do not specify it*. The correct work-around is to get non-valid strings returned as NA, not "". That is argument 'na.strings' in RODBC (and elsewhere: read.table behaves in the same way). -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Patrick Connolly
2007-Jan-07 19:42 UTC
[R] as.Date() results depend on order of data within vector?
On Sun, 07-Jan-2007 at 12:01PM +0000, Mark Wardle wrote: |> Dear all, |> |> The as.Date() function appears to give different results depending on |> the order of the vector passed into it. |> |> d1 = c("1900-01-01", "2007-01-01","","2001-05-03") |> d2 = c("", "1900-01-01", "2007-01-01","2001-05-03") |> as.Date(d1) # gives correct results |> as.Date(d2) # fails with error (* see below) |> |> This problem does not arise if the dates are NA rather than an empty |> string, but my data is coming via RODBC and I still don't have NAs |> passed across properly. |> |> I might add that I initially noticed this behaviour when using RODBC's |> sqlQuery() function call, and I initially had difficulty explaining why |> one column of dates was passed correctly, but another failed. The |> failing column was a "date of death" column where it was NA ("") for |> most patients. |> |> I've come up with two workarounds that work. The first is to sort the |> data at the SQL level, ensuring the initial record is not null. The |> second is to use sqlQuery() with as.is=T option, and then do the sorting |> and conversion afterwards. Simpler, I think, is to add one line d2[d2 == ""] <- NA I've not tested the idea extensively, so there might be occasions where it falls down. If you're working with a dataframe, you can use one of the apply functions to effect all columns. HTH -- ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~. ___ Patrick Connolly {~._.~} Great minds discuss ideas _( Y )_ Middle minds discuss events (:_~*~_:) Small minds discuss people (_)-(_) ..... Anon ~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.~.