Hi, I don't understand the following. When I create a small artificial set of date information in class POSIXct, I can calculate the mean and the median: a = as.POSIXct(Sys.time()) a = a + 60*0:10; a [1] "2009-03-10 11:30:16 EDT" "2009-03-10 11:31:16 EDT" "2009-03-10 11:32:16 EDT" [4] "2009-03-10 11:33:16 EDT" "2009-03-10 11:34:16 EDT" "2009-03-10 11:35:16 EDT" [7] "2009-03-10 11:36:16 EDT" "2009-03-10 11:37:16 EDT" "2009-03-10 11:38:16 EDT" [10] "2009-03-10 11:39:16 EDT" "2009-03-10 11:40:16 EDT" median(a) [1] "2009-03-10 11:35:16 EDT" mean(a) [1] "2009-03-10 11:35:16 EDT" But for real data (for this post, a short subset is in object c) that I have converted into a POSIXct object, I cannot calculate the median with median(), though I do get it with summary(): c [1] "2009-02-24 14:51:18 EST" "2009-02-24 14:51:19 EST" "2009-02-24 14:51:19 EST" [4] "2009-02-24 14:51:20 EST" "2009-02-24 14:51:20 EST" "2009-02-24 14:51:21 EST" [7] "2009-02-24 14:51:21 EST" "2009-02-24 14:51:22 EST" "2009-02-24 14:51:22 EST" [10] "2009-02-24 14:51:22 EST" class(c) [1] "POSIXt" "POSIXct" median(c) Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm = FALSE) : 'sum' not defined for "POSIXt" objects One difference is that in my own date-time series, some events are repeated (the original data contained fractions of seconds). But then, why can I get a median through summary()? summary(c) Min. 1st Qu. Median "2009-02-24 14:51:18 EST" "2009-02-24 14:51:19 EST" "2009-02-24 14:51:20 EST" Mean 3rd Qu. Max. "2009-02-24 14:51:20 EST" "2009-02-24 14:51:21 EST" "2009-02-24 14:51:22 EST" Thanks in advance, Denis Chabot sessionInfo() R version 2.8.1 Patched (2009-01-19 r47650) i386-apple-darwin9.6.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doBy_3.7 chron_2.3-30 loaded via a namespace (and not attached): [1] Hmisc_3.5-2 cluster_1.11.12 grid_2.8.1 lattice_0.17-20 tools_2.8.1
Your problem arises in R 2.8.1 (and 2.9.0-devel, but not 2.7.0) when length(POSIXct object) is even, because median(POSIXct object) passes a POSIXct object to median.default, which calls sum() in the even length case.> median( as.POSIXct(Sys.time()))[1] "2009-03-10 10:28:46 PDT"> median( as.POSIXct(rep(Sys.time(),2)))Error in Summary.POSIXct(c(1236706132.54740, 1236706132.54740), na.rm FALSE) : 'sum' not defined for "POSIXt" objects> traceback()4: stop(gettextf("'%s' not defined for \"POSIXt\" objects", .Generic), domain = NA) 3: Summary.POSIXct(c(1236706132.54740, 1236706132.54740), na.rm = FALSE) 2: median.default(as.POSIXct(rep(Sys.time(), 2))) 1: median(as.POSIXct(rep(Sys.time(), 2)))> version_ platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major 2 minor 8.1 year 2008 month 12 day 22 svn rev 47281 language R version.string R version 2.8.1 (2008-12-22) Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com ------------------------------------------------------------ [R] puzzled by math on date-time objects Denis Chabot chabotd at globetrotter.net Tue Mar 10 16:44:07 CET 2009 Previous message: [R] nonmetric clustering Next message: [R] perform subgroup meta-analysis and create forest plot displaying subgroups Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] Hi, I don't understand the following. When I create a small artificial set of date information in class POSIXct, I can calculate the mean and the median: a = as.POSIXct(Sys.time()) a = a + 60*0:10; a [1] "2009-03-10 11:30:16 EDT" "2009-03-10 11:31:16 EDT" "2009-03-10 11:32:16 EDT" [4] "2009-03-10 11:33:16 EDT" "2009-03-10 11:34:16 EDT" "2009-03-10 11:35:16 EDT" [7] "2009-03-10 11:36:16 EDT" "2009-03-10 11:37:16 EDT" "2009-03-10 11:38:16 EDT" [10] "2009-03-10 11:39:16 EDT" "2009-03-10 11:40:16 EDT" median(a) [1] "2009-03-10 11:35:16 EDT" mean(a) [1] "2009-03-10 11:35:16 EDT" But for real data (for this post, a short subset is in object c) that I have converted into a POSIXct object, I cannot calculate the median with median(), though I do get it with summary(): c [1] "2009-02-24 14:51:18 EST" "2009-02-24 14:51:19 EST" "2009-02-24 14:51:19 EST" [4] "2009-02-24 14:51:20 EST" "2009-02-24 14:51:20 EST" "2009-02-24 14:51:21 EST" [7] "2009-02-24 14:51:21 EST" "2009-02-24 14:51:22 EST" "2009-02-24 14:51:22 EST" [10] "2009-02-24 14:51:22 EST" class(c) [1] "POSIXt" "POSIXct" median(c) Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm = FALSE) : 'sum' not defined for "POSIXt" objects One difference is that in my own date-time series, some events are repeated (the original data contained fractions of seconds). But then, why can I get a median through summary()? summary(c) Min. 1st Qu. Median "2009-02-24 14:51:18 EST" "2009-02-24 14:51:19 EST" "2009-02-24 14:51:20 EST" Mean 3rd Qu. Max. "2009-02-24 14:51:20 EST" "2009-02-24 14:51:21 EST" "2009-02-24 14:51:22 EST" Thanks in advance, Denis Chabot sessionInfo() R version 2.8.1 Patched (2009-01-19 r47650) i386-apple-darwin9.6.0 locale: fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] doBy_3.7 chron_2.3-30 loaded via a namespace (and not attached): [1] Hmisc_3.5-2 cluster_1.11.12 grid_2.8.1 lattice_0.17-20 tools_2.8.1
median.default was changed between 2.7.1 and 2.8.1 to call sum(...)/2 instead of mean(...) and that causes the problem for POSIXct objects (sum fails but mean works for them). Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com> -----Original Message----- > From: William Dunlap > Sent: Tuesday, March 10, 2009 11:37 AM > To: 'r-help at r-project.org' > Subject: Re: [R] puzzled by math on date-time objects > > Your problem arises in R 2.8.1 (and 2.9.0-devel, but not 2.7.0) when > length(POSIXct object) is even, because median(POSIXct object) > passes a POSIXct object to median.default, which calls > sum() in the even length case. > > > median( as.POSIXct(Sys.time())) > [1] "2009-03-10 10:28:46 PDT" > > median( as.POSIXct(rep(Sys.time(),2))) > Error in Summary.POSIXct(c(1236706132.54740, > 1236706132.54740), na.rm = FALSE) : > 'sum' not defined for "POSIXt" objects > > traceback() > 4: stop(gettextf("'%s' not defined for \"POSIXt\" objects", .Generic), > domain = NA) > 3: Summary.POSIXct(c(1236706132.54740, 1236706132.54740), > na.rm = FALSE) > 2: median.default(as.POSIXct(rep(Sys.time(), 2))) > 1: median(as.POSIXct(rep(Sys.time(), 2))) > > version > _ > platform i686-pc-linux-gnu > arch i686 > os linux-gnu > system i686, linux-gnu > status > major 2 > minor 8.1 > year 2008 > month 12 > day 22 > svn rev 47281 > language R > version.string R version 2.8.1 (2008-12-22) > > Bill Dunlap > TIBCO Software Inc - Spotfire Division > wdunlap tibco.com > > ------------------------------------------------------------ > [R] puzzled by math on date-time objects > > Denis Chabot chabotd at globetrotter.net > Tue Mar 10 16:44:07 CET 2009 > Previous message: [R] nonmetric clustering > Next message: [R] perform subgroup meta-analysis and create > forest plot displaying subgroups > Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > Hi, > > I don't understand the following. When I create a small > artificial set > of date information in class POSIXct, I can calculate the > mean and the > median: > > a = as.POSIXct(Sys.time()) > a = a + 60*0:10; a > > [1] "2009-03-10 11:30:16 EDT" "2009-03-10 11:31:16 EDT" > "2009-03-10 > 11:32:16 EDT" > [4] "2009-03-10 11:33:16 EDT" "2009-03-10 11:34:16 EDT" > "2009-03-10 > 11:35:16 EDT" > [7] "2009-03-10 11:36:16 EDT" "2009-03-10 11:37:16 EDT" > "2009-03-10 > 11:38:16 EDT" > [10] "2009-03-10 11:39:16 EDT" "2009-03-10 11:40:16 EDT" > > median(a) > [1] "2009-03-10 11:35:16 EDT" > mean(a) > [1] "2009-03-10 11:35:16 EDT" > > > But for real data (for this post, a short subset is in object > c) that > I have converted into a POSIXct object, I cannot calculate > the median > with median(), though I do get it with summary(): > > c > [1] "2009-02-24 14:51:18 EST" "2009-02-24 14:51:19 EST" > "2009-02-24 > 14:51:19 EST" > [4] "2009-02-24 14:51:20 EST" "2009-02-24 14:51:20 EST" > "2009-02-24 > 14:51:21 EST" > [7] "2009-02-24 14:51:21 EST" "2009-02-24 14:51:22 EST" > "2009-02-24 > 14:51:22 EST" > [10] "2009-02-24 14:51:22 EST" > > class(c) > [1] "POSIXt" "POSIXct" > > median(c) > Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm = > FALSE) : > 'sum' not defined for "POSIXt" objects > > One difference is that in my own date-time series, some events are > repeated (the original data contained fractions of seconds). > But then, > why can I get a median through summary()? > > summary(c) > Min. 1st > Qu. Median > "2009-02-24 14:51:18 EST" "2009-02-24 14:51:19 EST" "2009-02-24 > 14:51:20 EST" > Mean 3rd > Qu. Max. > "2009-02-24 14:51:20 EST" "2009-02-24 14:51:21 EST" "2009-02-24 > 14:51:22 EST" > > Thanks in advance, > > > Denis Chabot > > sessionInfo() > R version 2.8.1 Patched (2009-01-19 r47650) > i386-apple-darwin9.6.0 > > locale: > fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] doBy_3.7 chron_2.3-30 > > loaded via a namespace (and not attached): > [1] Hmisc_3.5-2 cluster_1.11.12 grid_2.8.1 lattice_0.17-20 > tools_2.8.1 >
It does seem sensible that median and quantile would work for the POSIXct, Date, and other classes for which they are logically well-defined, but strangely enough, they do not (except for odd-length input). The summary function has a special case (summary.POSIXct) which does the straightforward, obvious thing here. I do not understand why this is not done by median and quantile themselves. Conversely, for the complex class, median and quantile do not give an error but instead produce meaningless results (since the complexes do not form an ordered field). There are other operations as well that don't do the natural, useful thing on a variety of classes, including POSIXct, Date, and difftime:> dates <- c(as.Date('2000-01-01'), as.Date('2000-01-05'), as.Date('1990-08-09'), as.Date('2008-02-29')) > median(dates)Error in Summary.Date(c(10957, 10961), na.rm = FALSE) : sum not defined for Date objects> quantile(dates,.5)Error in Ops.Date((1 - h), qs[i]) : * not defined for Date objects> gaps <- diff(sort(dates)) > gapsTime differences in days [1] 3432 4 2977> cumsum(gaps)Error in Math.difftime(gaps) : 'cumsum' not defined for "difftime" objects> rle(dates)Error in rle(dates) : 'x' must be an atomic vector> rle(gaps)Error in rle(gaps) : 'x' must be an atomic vector I have documented some of these issues in my email "Semantics of sequences in R" (22/02/2009 3:42 PM) and other emails, and I have proposed to write code to resolve them, but have not received a warm reception. All I can suggest at this point is that you write your own versions of these functions. Or perhaps one of us will write a CRAN package, though this really seems like core behavior. -s On Tue, Mar 10, 2009 at 11:44 AM, Denis Chabot <chabotd at globetrotter.net> wrote:> I don't understand the following. When I create a small artificial set of > date information in class POSIXct, I can calculate the mean and the median: > ...But for real data (for this post, a short subset is in object c) ?that I > have converted into a POSIXct object, I cannot calculate the median with > median(), though I do get it with summary():
Hi Phil, Well thank you very much for this detailed explanation. It will help me when summarizing information over periods of time using either summarize (Hmisc) or summaryBy (doBy). Until now, doing so resulted in "mean" time for each "group" being transformed as a number of seconds, as you explain below. But both these functions do not put it back in a POSIX date-time object. I tried to do so by using "as.POSIXct()" but this failed because I did not provide a reference. From now on I'll try the structure command you used below. Denis Le 09-03-10 ? 19:04, Phil Spector a ?crit :> Denis - > If you look inside of summary.POSIXct, you'll see the > following: > > x <- summary.default(unclass(object), digits = digits, ...)[1:6] > > In other words, summary accepts the POSIX object, unclasses it > (resulting in a numeric value representing the number of seconds > since January 1, 1960), performs the operation, and then reassigns > the class. You can do this basic trick yourself. Suppose we have a > vector of dates and want the median: > >> dates = >> as.POSIXct(c('2009-3-15','2009-2-19','2009-3-20','2009-2-18')) >> median(dates) > Error in Summary.POSIXct(c(1235030400, 1237100400), na.rm = FALSE) : > 'sum' not defined for "POSIXt" objects >> res = median(as.numeric(dates)) >> structure(res,class='POSIXct') > [1] "2009-03-02 23:30:00 PST" > > I think it's clear that you can do any arithmetic operation on > dates this way, even if it doesn't make sense: > >> sum(dates) > Error in Summary.POSIXct(c(1237100400, 1235030400, 1237532400, > 1234944000 : > 'sum' not defined for "POSIXt" objects >> res = sum(as.numeric(dates)) >> structure(res,class='POSIXct') > [1] "2126-09-08 23:00:00 PDT" > > I'm quite certain that median.POSIXct will be fixed pretty quickly, > but you can always unclass and reclass to do what you need. > > - Phil > > > > > > > On Tue, 10 Mar 2009, Denis Chabot wrote: > >> Thanks Phil, >> >> but how does summary() finds the median of the same type of object? >> I would have thought the algorithm used when the vector is even >> would also require the SUM of the POSIX vector. I am glad of the >> solution you propose, but still puzzled a bit! >> >> Denis >> Le 09-03-10 ? 12:39, Phil Spector a ?crit : >> >>> Denis - >>> There is no median method for POSIX objects, although >>> there is a summary object. Thus, when you pass a POSIX >>> object to median, it uses median.default, which contains >>> the following code: >>> >>> if (n%%2L == 1L) >>> sort(x, partial = half)[half] >>> else sum(sort(x, partial = half + 0L:1L)[half + 0L:1L])/2 >>> So when the length of your POSIX vector is odd, it works, but if >>> it's even, it would need to take the sum of a POSIX >>> object. Of course, there is no sum method for POSIX objects, >>> since it doesn't make sense. >>> Right now, it looks like your best bet for a summary of POSIX >>> objects is >>> summary(a)['Median'] >>> >>> - Phil Spector >>> Statistical Computing Facility >>> Department of Statistics >>> UC Berkeley >>> spector at stat.berkeley.edu >>> On Tue, 10 Mar 2009, Denis Chabot wrote: >>>> Hi, >>>> I don't understand the following. When I create a small >>>> artificial set of date information in class POSIXct, I can >>>> calculate the mean and the median: >>>> a = as.POSIXct(Sys.time()) >>>> a = a + 60*0:10; a >>>> [1] "2009-03-10 11:30:16 EDT" "2009-03-10 11:31:16 EDT" >>>> "2009-03-10 11:32:16 EDT" >>>> [4] "2009-03-10 11:33:16 EDT" "2009-03-10 11:34:16 EDT" >>>> "2009-03-10 11:35:16 EDT" >>>> [7] "2009-03-10 11:36:16 EDT" "2009-03-10 11:37:16 EDT" >>>> "2009-03-10 11:38:16 EDT" >>>> [10] "2009-03-10 11:39:16 EDT" "2009-03-10 11:40:16 EDT" >>>> median(a) >>>> [1] "2009-03-10 11:35:16 EDT" >>>> mean(a) >>>> [1] "2009-03-10 11:35:16 EDT" >>>> But for real data (for this post, a short subset is in object c) >>>> that I have converted into a POSIXct object, I cannot calculate >>>> the median with median(), though I do get it with summary(): >>>> c >>>> [1] "2009-02-24 14:51:18 EST" "2009-02-24 14:51:19 EST" >>>> "2009-02-24 14:51:19 EST" >>>> [4] "2009-02-24 14:51:20 EST" "2009-02-24 14:51:20 EST" >>>> "2009-02-24 14:51:21 EST" >>>> [7] "2009-02-24 14:51:21 EST" "2009-02-24 14:51:22 EST" >>>> "2009-02-24 14:51:22 EST" >>>> [10] "2009-02-24 14:51:22 EST" >>>> class(c) >>>> [1] "POSIXt" "POSIXct" >>>> median(c) >>>> Erreur dans Summary.POSIXct(c(1235505080.6, 1235505081.1), na.rm >>>> = FALSE) : >>>> 'sum' not defined for "POSIXt" objects >>>> One difference is that in my own date-time series, some events >>>> are repeated (the original data contained fractions of seconds). >>>> But then, why can I get a median through summary()? >>>> summary(c) >>>> Min. 1st Qu. >>>> Median >>>> "2009-02-24 14:51:18 EST" "2009-02-24 14:51:19 EST" "2009-02-24 >>>> 14:51:20 EST" >>>> Mean 3rd >>>> Qu. Max. >>>> "2009-02-24 14:51:20 EST" "2009-02-24 14:51:21 EST" "2009-02-24 >>>> 14:51:22 EST" >>>> Thanks in advance, >>>> Denis Chabot >>>> sessionInfo() >>>> R version 2.8.1 Patched (2009-01-19 r47650) >>>> i386-apple-darwin9.6.0 >>>> locale: >>>> fr_CA.UTF-8/fr_CA.UTF-8/C/C/fr_CA.UTF-8/fr_CA.UTF-8 >>>> attached base packages: >>>> [1] stats graphics grDevices utils datasets methods >>>> base >>>> other attached packages: >>>> [1] doBy_3.7 chron_2.3-30 >>>> loaded via a namespace (and not attached): >>>> [1] Hmisc_3.5-2 cluster_1.11.12 grid_2.8.1 >>>> lattice_0.17-20 tools_2.8.1 >>>> ______________________________________________ >>>> R-help at r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code.