Martin Maechler
2020-Apr-04 09:49 UTC
[Rd] Help useRs to use R's own Time/Date objects more efficiently
This is mostly a RFC [but *not* about the many extra packages, please..]: Noticing to my chagrin how my students work in a project, googling for R code and cut'n'pasting stuff together, accumulating this and that package on the way all just for simple daily time series (though with partly missing parts), using chron, zoo, lubridate, ... all for things that are very easy in base R *IF* you read help pages and start thinking on your own (...), I've noted once more that the above "if" is a very strong one, and seems to happen rarely nowadays by typical R users... (yes, I stop whining for now). In this case, I propose to slightly improve the situation ... by adding a few more lines to one help page [[how could that help in the age where "google"+"cut'n'paste" has replaced thinking ? .. ]] : On R's own ?Dates help page (and also on ?DateTimeClasses ) we have pointers, notably See Also: ............... ............... 'weekdays' for convenience extraction functions. So people must find that and follow the pointer (instead of installing one of the dozen helper packages). Then on that page, one sees weekdays(), months() .. julian() in the usage ... which don't seem directly helpful for a person who needs more. If that person is diligent and patient (as good useRs are ;-), she finds Note: Other components such as the day of the month or the year are very easy to compute: just use 'as.POSIXlt' and extract the relevant component. Alternatively (especially if the components are desired as character strings), use 'strftime'. But then, nowadays, the POSIXlt class is not so transparent to the non-expert anymore (as it behaves very much like POSIXct, and not like a list for good reasons) .. and so 97% of R users will not find this "very easy". For this reason, I propose to at add the following to the 'Examples:' section of the help file ... and I hope that also readers of R-devel who have not been aware of how to do this nicely, will now remember (or remember where to look?). I at least will tell my students in the future to use these or write versions of these simple utility functions. ------------------------------------------------ ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) ##' Transform (Time-)Date vector to convenient data frame : dt2df <- function(dt, dName = deparse(substitute(dt)), stringsAsFactors = FALSE) { DF <- as.data.frame(unclass(as.POSIXlt( dt )), stringsAsFactors=stringsAsFactors) `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) } dt2df(.leap.seconds) # date+time dt2df(Sys.Date() + 0:9) # date ##' Even simpler: Date -> Matrix: d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))) d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date ------------------------------------------------------------ In the distant past / one of the last times I touched on people using (base) R's Date / Time-Date objects, I had started thinking if we should not provide some simple utilities to "base R" (not in the 'base' pkg, but rather 'utils') for "extracting" from {POSIX(ct), Date} objects ... and we may have discussed that within R Core 20 years ago, and had always thought that this shouldn't be hard for useRs themselves to see how to do... But then I see that "everybody" uses extension packages instead, even in the many situations where there's no gain doing so, but rather increases the dependency-complexity of the data analysis unnecessarily. Martin Maechler ETH Zurich and R Core Team.
IƱaki Ucar
2020-Apr-04 10:35 UTC
[Rd] Help useRs to use R's own Time/Date objects more efficiently
On Sat, 4 Apr 2020 at 11:51, Martin Maechler <maechler at stat.math.ethz.ch> wrote:> > This is mostly a RFC [but *not* about the many extra packages, please..]: > > Noticing to my chagrin how my students work in a project, > googling for R code and cut'n'pasting stuff together, accumulating > this and that package on the way all just for simple daily time series > (though with partly missing parts), > using chron, zoo, lubridate, ... all for things that are very > easy in base R *IF* you read help pages and start thinking on > your own (...), I've noted once more that the above "if" is a > very strong one, and seems to happen rarely nowadays by typical R users... > (yes, I stop whining for now).It's not my intention to sound harsh here, but just to provide constructive criticism (I clarify this beforehand because, you know, this is an email). It's too easy to whine about this every now and then, and blame the useRs for not being diligent enough, not patient enough and not reading enough manual pages. But did you considered that maybe it's the usability of this stuff in base R what leaves much to be desired, and the lack of good and intuitive helpers what triggered the development of so many related packages?> In this case, I propose to slightly improve the situation ... > by adding a few more lines to one help page [[how could that > help in the age where "google"+"cut'n'paste" has replaced thinking ? .. ]] :Google + cut'n'paste hasn't replaced thinking, but struggling. So no, I don't think that more documentation (which I do think is already great) improves the situation. ...snip...> In the distant past / one of the last times I touched on people > using (base) R's Date / Time-Date objects, I had started > thinking if we should not provide some simple utilities to "base R" > (not in the 'base' pkg, but rather 'utils') for "extracting" from > {POSIX(ct), Date} objects ... and we may have discussed that > within R Core 20 years ago, and had always thought that this > shouldn't be hard for useRs themselves to see how to do...Never too late to change your mind.> But then I see that "everybody" uses extension packages instead, > even in the many situations where there's no gain doing so, > but rather increases the dependency-complexity of the data analysis > unnecessarily.I do think there's gain. Again, it's not poor silly useRs not doing their homework, it's a handful of developers that invested many many hours of their time for years producing extension packages for a functionality that is perfectly covered in base R. Maybe it's time to think that it's not that well covered? -- I?aki ?car
J C Nash
2020-Apr-04 13:51 UTC
[Rd] Help useRs to use R's own Time/Date objects more efficiently
As with many areas of R usage, my view is that the concern is one of making it easier to find appropriate information quickly. The difficulty is that different users have different needs. So if one wants to know (most of) what is available, the Time Series Task View is helpful. If one is a novice, it may now be rather daunting, while I've found, as a long time user of different software, that I have to dig to find what I need. In optimization I have tried -- and have had several false starts -- to unify several packages. That could be helpful for time and date functions. Another possibility could be to put the "see" and "see also" information at the TOP of the documentation rather than lower down, and also to refer to Task Views and possibly other -- eventually R-project -- documentation objects. I happen to favour wiki-like approaches, but there has not been much movement towards that yet. We R people are quite strong individualists, but perhaps more team minded thinking would help. Some of us are getting beyond our best-before date. However, I support Martin's intent, and hope there will be attempts in these directions. Best, John Nash On 2020-04-04 5:49 a.m., Martin Maechler wrote:> This is mostly a RFC [but *not* about the many extra packages, please..]: > > Noticing to my chagrin how my students work in a project, > googling for R code and cut'n'pasting stuff together, accumulating > this and that package on the way all just for simple daily time series > (though with partly missing parts), > using chron, zoo, lubridate, ... all for things that are very > easy in base R *IF* you read help pages and start thinking on > your own (...), I've noted once more that the above "if" is a > very strong one, and seems to happen rarely nowadays by typical R users... > (yes, I stop whining for now). > > In this case, I propose to slightly improve the situation ... > by adding a few more lines to one help page [[how could that > help in the age where "google"+"cut'n'paste" has replaced thinking ? .. ]] : > > On R's own ?Dates help page (and also on ?DateTimeClasses ) > we have pointers, notably > > See Also: > > ............... > ............... > > 'weekdays' for convenience extraction functions. > > So people must find that and follow the pointer > (instead of installing one of the dozen helper packages). > > Then on that page, one sees weekdays(), months() .. julian() > in the usage ... which don't seem directly helpful for a person > who needs more. If that person is diligent and patient (as good useRs are ;-), > she finds > > Note: > > Other components such as the day of the month or the year are very > easy to compute: just use 'as.POSIXlt' and extract the relevant > component. Alternatively (especially if the components are > desired as character strings), use 'strftime'. > > > But then, nowadays, the POSIXlt class is not so transparent to the > non-expert anymore (as it behaves very much like POSIXct, and > not like a list for good reasons) .. and so 97% of R users will > not find this "very easy". > > For this reason, I propose to at add the following to the > 'Examples:' section of the help file ... > and I hope that also readers of R-devel who have not been > aware of how to do this nicely, will now remember (or remember > where to look?). > > I at least will tell my students in the future to use these or > write versions of these simple utility functions. > > > ------------------------------------------------ > > ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : > ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) > > ##' Transform (Time-)Date vector to convenient data frame : > dt2df <- function(dt, dName = deparse(substitute(dt)), stringsAsFactors = FALSE) { > DF <- as.data.frame(unclass(as.POSIXlt( dt )), stringsAsFactors=stringsAsFactors) > `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) > } > dt2df(.leap.seconds) # date+time > dt2df(Sys.Date() + 0:9) # date > > ##' Even simpler: Date -> Matrix: > d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))) > d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date > > ------------------------------------------------------------ > > In the distant past / one of the last times I touched on people > using (base) R's Date / Time-Date objects, I had started > thinking if we should not provide some simple utilities to "base R" > (not in the 'base' pkg, but rather 'utils') for "extracting" from > {POSIX(ct), Date} objects ... and we may have discussed that > within R Core 20 years ago, and had always thought that this > shouldn't be hard for useRs themselves to see how to do... > > But then I see that "everybody" uses extension packages instead, > even in the many situations where there's no gain doing so, > but rather increases the dependency-complexity of the data analysis > unnecessarily. > > Martin Maechler > ETH Zurich and R Core Team. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Abby Spurdle
2020-Apr-05 20:57 UTC
[Rd] Help useRs to use R's own Time/Date objects more efficiently
I think POSIXct and POSIXlt are badly-chosen names. The name "POSIX" implies UNIX. (i.e. XYZix operating system is mostly POSIX compliant... Woo-Hoo!). My assumption is that most people modelling industrial/econometric data etc, or data imported from databases, don't want system references everywhere. Historically, I've use the principle that: If programming language A uses functionality from programming language B, then bindings should be as close as possible to whatever is in programming language B. Any additional functionality in programming language A, should be distinct from the bindings. R hasn't done this here, where POSIX-bindings have added in additional R functionality and semantics. Possibly introducing problems at an early stage. The help file entitled DateTimeClasses, only covers a small subset of information on date and time classes, with no obvious information about how to construct date and time objects, except for what's in the examples. The Date class has a similar problem, omitting information about how to construct Date objects. The "convenience extraction functions" aren't necessarily convenient because they return text rather than integers, requiring many users to still use the POSIXlt class. I don't think your example is simple. And I suspect it may discourage some people from using base packages. Having opposite effect to what's intended. It's probably too late to change the functions, but here's what I would suggest: (1) Create a top-level help page with a title like "Date and Time Classes" to give a brief but general overview. This would mean the existing DateTimeClasses would need a new title. (2) Create a another function the same as as.POSIXlt, but with a more user-friendly name, which would increase readability. (3) If help files for describing classes are separate from the help files for creating/coercing objects (e.g. Date vs as.Date), then I think they should cross reference each other in the description field, not just the details or seealso fields. (4) Reference relevant extraction/formatting functions, in most date/time help files, even if there's some (small) duplication in the help files. (5) Focus on keeping the examples simple rather than comprehensive. Expanding on suggestion (4), if you read the help file for as.Date (which seems like an obvious starting point, because that's where I started reading...), there's no reference at all to getting the month, or the day of the week, etc. To make it worse it doesn't mention coercion to POSIXlt objects either (but does mention coercion from POSIXlt to Date objects). This could give the wrong impression to many readers... In it's defense, it does link to Date, which links to weekdays, which links to as.POSIXlt. Of course the note and seealso fields are near the bottom, and there's an implicit (possibly false) assumption that the reader will read all the help file*s*, and follow the links at the bottom, at least three times over. And a new-ish R user is likely to have to read more than four help files. Unless they Google it, read stack exchange, or read some fancy (apparently modern) textbook on data science... Reinforcing the need for the help files to be clear about what the functions (collectively) can do and specifically what extraction/formatting functionality is available... My guess is the that most common tasks with date and time objects are: (1) Reading a character vector representing dates/times. (2) Formatting a date/time (i.e. Object to character vector, or character vector to another character vector). (3) Extracting information such as month, weekday, etc, either as an integer or as text. So, I in short, these should be easy (to do, and find out how to do)... On Sat, Apr 4, 2020 at 10:51 PM Martin Maechler <maechler at stat.math.ethz.ch> wrote:> > This is mostly a RFC [but *not* about the many extra packages, please..]: > > Noticing to my chagrin how my students work in a project, > googling for R code and cut'n'pasting stuff together, accumulating > this and that package on the way all just for simple daily time series > (though with partly missing parts), > using chron, zoo, lubridate, ... all for things that are very > easy in base R *IF* you read help pages and start thinking on > your own (...), I've noted once more that the above "if" is a > very strong one, and seems to happen rarely nowadays by typical R users... > (yes, I stop whining for now). > > In this case, I propose to slightly improve the situation ... > by adding a few more lines to one help page [[how could that > help in the age where "google"+"cut'n'paste" has replaced thinking ? .. ]] : > > On R's own ?Dates help page (and also on ?DateTimeClasses ) > we have pointers, notably > > See Also: > > ............... > ............... > > 'weekdays' for convenience extraction functions. > > So people must find that and follow the pointer > (instead of installing one of the dozen helper packages). > > Then on that page, one sees weekdays(), months() .. julian() > in the usage ... which don't seem directly helpful for a person > who needs more. If that person is diligent and patient (as good useRs are ;-), > she finds > > Note: > > Other components such as the day of the month or the year are very > easy to compute: just use 'as.POSIXlt' and extract the relevant > component. Alternatively (especially if the components are > desired as character strings), use 'strftime'. > > > But then, nowadays, the POSIXlt class is not so transparent to the > non-expert anymore (as it behaves very much like POSIXct, and > not like a list for good reasons) .. and so 97% of R users will > not find this "very easy". > > For this reason, I propose to at add the following to the > 'Examples:' section of the help file ... > and I hope that also readers of R-devel who have not been > aware of how to do this nicely, will now remember (or remember > where to look?). > > I at least will tell my students in the future to use these or > write versions of these simple utility functions. > > > ------------------------------------------------ > > ## Show how easily you get month, day, year, day (of {month, week, yr}), ... : > ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) > > ##' Transform (Time-)Date vector to convenient data frame : > dt2df <- function(dt, dName = deparse(substitute(dt)), stringsAsFactors = FALSE) { > DF <- as.data.frame(unclass(as.POSIXlt( dt )), stringsAsFactors=stringsAsFactors) > `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) > } > dt2df(.leap.seconds) # date+time > dt2df(Sys.Date() + 0:9) # date > > ##' Even simpler: Date -> Matrix: > d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))) > d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's release date > > ------------------------------------------------------------ > > In the distant past / one of the last times I touched on people > using (base) R's Date / Time-Date objects, I had started > thinking if we should not provide some simple utilities to "base R" > (not in the 'base' pkg, but rather 'utils') for "extracting" from > {POSIX(ct), Date} objects ... and we may have discussed that > within R Core 20 years ago, and had always thought that this > shouldn't be hard for useRs themselves to see how to do... > > But then I see that "everybody" uses extension packages instead, > even in the many situations where there's no gain doing so, > but rather increases the dependency-complexity of the data analysis > unnecessarily. > > Martin Maechler > ETH Zurich and R Core Team. > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Mark Leeds
2020-Apr-05 21:08 UTC
[Rd] Help useRs to use R's own Time/Date objects more efficiently
Hi All: I've been following this thread and just want to add one pointer. For those who aren't interested in using new packages that try to make dates-times easier but also find the base R tools confusing, below is link to an extremely well written document from over 15 years ago. It's probably already known by quite a few people but, for people with less than 10 years of R experience it could very well be unknown. It's the clearest explanation of base R tools for dates and times ( note: one needs to consider chron a base package but who's counting ) that I know of. https://www.researchgate.net/publication/229087103_R_Help_Desk_Date_and_time_classes_in_R On Sun, Apr 5, 2020 at 4:58 PM Abby Spurdle <spurdle.a at gmail.com> wrote:> I think POSIXct and POSIXlt are badly-chosen names. > The name "POSIX" implies UNIX. > (i.e. XYZix operating system is mostly POSIX compliant... Woo-Hoo!). > My assumption is that most people modelling industrial/econometric > data etc, or data imported from databases, don't want system > references everywhere. > > Historically, I've use the principle that: > If programming language A uses functionality from programming language > B, then bindings should be as close as possible to whatever is in > programming language B. Any additional functionality in programming > language A, should be distinct from the bindings. > R hasn't done this here, where POSIX-bindings have added in additional > R functionality and semantics. > Possibly introducing problems at an early stage. > > The help file entitled DateTimeClasses, only covers a small subset of > information on date and time classes, with no obvious information > about how to construct date and time objects, except for what's in the > examples. The Date class has a similar problem, omitting information > about how to construct Date objects. > > The "convenience extraction functions" aren't necessarily convenient > because they return text rather than integers, requiring many users to > still use the POSIXlt class. > > I don't think your example is simple. > And I suspect it may discourage some people from using base packages. > Having opposite effect to what's intended. > > It's probably too late to change the functions, but here's what I would > suggest: > > (1) Create a top-level help page with a title like "Date and Time > Classes" to give a brief but general overview. This would mean the > existing DateTimeClasses would need a new title. > (2) Create a another function the same as as.POSIXlt, but with a more > user-friendly name, which would increase readability. > (3) If help files for describing classes are separate from the help > files for creating/coercing objects (e.g. Date vs as.Date), then I > think they should cross reference each other in the description field, > not just the details or seealso fields. > (4) Reference relevant extraction/formatting functions, in most > date/time help files, even if there's some (small) duplication in the > help files. > (5) Focus on keeping the examples simple rather than comprehensive. > > Expanding on suggestion (4), if you read the help file for as.Date > (which seems like an obvious starting point, because that's where I > started reading...), there's no reference at all to getting the month, > or the day of the week, etc. To make it worse it doesn't mention > coercion to POSIXlt objects either (but does mention coercion from > POSIXlt to Date objects). This could give the wrong impression to many > readers... > > In it's defense, it does link to Date, which links to weekdays, which > links to as.POSIXlt. > > Of course the note and seealso fields are near the bottom, and there's > an implicit (possibly false) assumption that the reader will read all > the help file*s*, and follow the links at the bottom, at least three > times over. > And a new-ish R user is likely to have to read more than four help files. > Unless they Google it, read stack exchange, or read some fancy > (apparently modern) textbook on data science... > > Reinforcing the need for the help files to be clear about what the > functions (collectively) can do and specifically what > extraction/formatting functionality is available... > > My guess is the that most common tasks with date and time objects are: > (1) Reading a character vector representing dates/times. > (2) Formatting a date/time (i.e. Object to character vector, or > character vector to another character vector). > (3) Extracting information such as month, weekday, etc, either as an > integer or as text. > > So, I in short, these should be easy (to do, and find out how to do)... > > > On Sat, Apr 4, 2020 at 10:51 PM Martin Maechler > <maechler at stat.math.ethz.ch> wrote: > > > > This is mostly a RFC [but *not* about the many extra packages, > please..]: > > > > Noticing to my chagrin how my students work in a project, > > googling for R code and cut'n'pasting stuff together, accumulating > > this and that package on the way all just for simple daily time series > > (though with partly missing parts), > > using chron, zoo, lubridate, ... all for things that are very > > easy in base R *IF* you read help pages and start thinking on > > your own (...), I've noted once more that the above "if" is a > > very strong one, and seems to happen rarely nowadays by typical R > users... > > (yes, I stop whining for now). > > > > In this case, I propose to slightly improve the situation ... > > by adding a few more lines to one help page [[how could that > > help in the age where "google"+"cut'n'paste" has replaced thinking ? .. > ]] : > > > > On R's own ?Dates help page (and also on ?DateTimeClasses ) > > we have pointers, notably > > > > See Also: > > > > ............... > > ............... > > > > 'weekdays' for convenience extraction functions. > > > > So people must find that and follow the pointer > > (instead of installing one of the dozen helper packages). > > > > Then on that page, one sees weekdays(), months() .. julian() > > in the usage ... which don't seem directly helpful for a person > > who needs more. If that person is diligent and patient (as good useRs > are ;-), > > she finds > > > > Note: > > > > Other components such as the day of the month or the year are > very > > easy to compute: just use 'as.POSIXlt' and extract the relevant > > component. Alternatively (especially if the components are > > desired as character strings), use 'strftime'. > > > > > > But then, nowadays, the POSIXlt class is not so transparent to the > > non-expert anymore (as it behaves very much like POSIXct, and > > not like a list for good reasons) .. and so 97% of R users will > > not find this "very easy". > > > > For this reason, I propose to at add the following to the > > 'Examples:' section of the help file ... > > and I hope that also readers of R-devel who have not been > > aware of how to do this nicely, will now remember (or remember > > where to look?). > > > > I at least will tell my students in the future to use these or > > write versions of these simple utility functions. > > > > > > ------------------------------------------------ > > > > ## Show how easily you get month, day, year, day (of {month, week, yr}), > ... : > > ## (remember to count from 0 (!): mon = 0..11, wday = 0..6, etc !!) > > > > ##' Transform (Time-)Date vector to convenient data frame : > > dt2df <- function(dt, dName = deparse(substitute(dt)), stringsAsFactors > = FALSE) { > > DF <- as.data.frame(unclass(as.POSIXlt( dt )), > stringsAsFactors=stringsAsFactors) > > `names<-`(cbind(dt, DF, deparse.level=0L), c(dName, names(DF))) > > } > > dt2df(.leap.seconds) # date+time > > dt2df(Sys.Date() + 0:9) # date > > > > ##' Even simpler: Date -> Matrix: > > d2mat <- function(x) simplify2array(unclass(as.POSIXlt(x))) > > d2mat(seq(as.Date("2000-02-02"), by=1, length.out=30)) # has R 1.0.0's > release date > > > > ------------------------------------------------------------ > > > > In the distant past / one of the last times I touched on people > > using (base) R's Date / Time-Date objects, I had started > > thinking if we should not provide some simple utilities to "base R" > > (not in the 'base' pkg, but rather 'utils') for "extracting" from > > {POSIX(ct), Date} objects ... and we may have discussed that > > within R Core 20 years ago, and had always thought that this > > shouldn't be hard for useRs themselves to see how to do... > > > > But then I see that "everybody" uses extension packages instead, > > even in the many situations where there's no gain doing so, > > but rather increases the dependency-complexity of the data analysis > > unnecessarily. > > > > Martin Maechler > > ETH Zurich and R Core Team. > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Abby Spurdle
2020-Apr-06 01:13 UTC
[Rd] Help useRs to use R's own Time/Date objects more efficiently
> (1) Create a top-level help page with a title like "Date and Time > Classes" to give a brief but general overview. This would mean the > existing DateTimeClasses would need a new title.I wanted to modify my first suggestion. Perhaps a better idea would be to reference an external document giving an overview of the subject. I couldn't find a discussion of POSIXct/POSIXlt objects in the R manuals (unless I missed it somewhere), so perhaps "An Introduction to R" could be updated to include this subject, and then the help files could reference that? Mark Leeds has already mentioned one possible (unofficial) source. And I suspect that there are others.