Grant Farnsworth
2012-Jan-23 03:59 UTC
[R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.
I've been using sqldf heavily lately but have encountered problems with ordering of observations or calculating statistics such as max() and min() when the variable used is of class Date. For example, if I run the following code: =============== begin code ================library(sqldf) A<-data.frame(Dates=as.Date(c("1994-02-14","1977-02-23","2001-09-18","2009-08-01")),Ret=rnorm(4)) OrderedA<-sqldf('select * from A order by Dates') MaxA<-sqldf('select max(Dates) as Dates from A')[1,1] MinA<-sqldf('select min(Dates) as Dates from A')[1,1] =============== end code ================ Then the result is this:> ADates Ret 1 1994-02-14 1.2414706 2 1977-02-23 -0.7728146 3 2001-09-18 1.2551331 4 2009-08-01 -0.2538359> OrderedADates Ret 1 2001-09-18 1.2551331 2 2009-08-01 -0.2538359 3 1977-02-23 -0.7728146 4 1994-02-14 1.2414706> MaxA[1] "1994-02-14"> MinA[1] "2001-09-18" Completely wrong order, no warnings issued, and the summary stats are wrong as well (but consistent with the ordering). According to the sqldf manual found at the following URL http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables? this type of query should work correctly. Any clue why it is not doing so? User error or bug? =================== debug info ================> sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] tcltk stats graphics grDevices utils datasets methods [8] base other attached packages: [1] sqldf_0.4-2 chron_2.3-42 gsubfn_0.5-7 [4] proto_0.3-9.2 RSQLite.extfuns_0.0.1 RSQLite_0.10.0 [7] DBI_0.2-5 ================ end debug info =================
Gabor Grothendieck
2012-Jan-23 05:46 UTC
[R] sqldf + Date class. Ordering and summary statistics appear to be incorrect.
On Sun, Jan 22, 2012 at 10:59 PM, Grant Farnsworth <gvfarns at gmail.com> wrote:> I've been using sqldf heavily lately but have encountered problems > with ordering of observations or calculating statistics such as max() > and min() when the variable used is of class Date. > > For example, if I run the following code: > > =============== begin code ================> library(sqldf) > A<-data.frame(Dates=as.Date(c("1994-02-14","1977-02-23","2001-09-18","2009-08-01")),Ret=rnorm(4)) > OrderedA<-sqldf('select * from A order by Dates') > MaxA<-sqldf('select max(Dates) as Dates from A')[1,1] > MinA<-sqldf('select min(Dates) as Dates from A')[1,1] > =============== end code ================> > Then the result is this: > >> A > ? ? ? Dates ? ? ? ?Ret > 1 1994-02-14 ?1.2414706 > 2 1977-02-23 -0.7728146 > 3 2001-09-18 ?1.2551331 > 4 2009-08-01 -0.2538359 > >> OrderedA > ? ? ? Dates ? ? ? ?Ret > 1 2001-09-18 ?1.2551331 > 2 2009-08-01 -0.2538359 > 3 1977-02-23 -0.7728146 > 4 1994-02-14 ?1.2414706 > >> MaxA > [1] "1994-02-14" > >> MinA > [1] "2001-09-18" > > Completely wrong order, no warnings issued, and the summary stats are > wrong as well (but consistent with the ordering). > > According to the sqldf manual found at the following URL > > http://code.google.com/p/sqldf/#4._How_does_sqldf_work_with_%22Date%22_class_variables? > > this type of query should work correctly. ?Any clue why it is not > doing so? ?User error or bug? >You are using an old version. Update to latest version of R and sqldf. -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com