This has some nice properties: 1) It self-documents the input expectations in a similar manner to colClasses. 2) The implementation could eventually "push down" the coercion, e.g., calling it on each chunk of an iterative read operation. The implementation needs work though, and I'm not convinced that coercion failures should fallback gracefully to the default. Feature requests fall under a "bug" in bugzilla terminology, so please submit this there. I think I've made you an account. Thanks, Michael On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck < dev.kurt at vandijck-laurijssen.be> wrote:> Thank you for your answers. > I rather do not file a new bug, since what I coded isn't really a bug. > > The problem I (my colleagues) have today is very stupid: > We read .csv files with a lot of columns, of which most contain > date-time stamps, coded in DD/MM/YYYY HH:MM. > This is not exotic, but the base library's readtable (and derivatives) > only accept date-times in a limited number of possible formats (which I > understand very well). > > We could specify a format in a rather complicated format, for each > column individually, but this syntax is rather difficult to maintain. > > My solution to this specific problem became trivial, yet generic > extension to read.table. > Rather than relying on the built-in type detection, I added a parameter > to a function that will be called for each to-be-type-probed column so I > can overrule the built-in limited default. > If nothing returns from the function, the built-in default is still > used. > > This way, I could construct a type-probing function that is > straight-forward, not hard to code, and makes reading my .csv files > acceptible in terms of code (read.table parameters). > > I'm sure I'm not the only one dealing with such needs, escpecially > date-time formats exist in enormous amounts, but I want to stress here > that my approach is agnostic to my specific problem. > > For those asking to 'show me the code', I redirect to my 2nd patch, > where the tests have been extended with my specific problem. > > What are your opinions about this? > > Kind regards, > Kurt > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]
Just to clarify/amplify: on the bug tracking system there's a drop-down menu to specify severity, and "enhancement" is one of the choices, so you don't have to worry that you're misrepresenting your patch as fixing a bug. The fact that an R-core member (Michael Lawrence) thinks this is worth looking at is very encouraging (and somewhat unusual for feature/enhancement suggestions)! Ben Bolker On Wed, Mar 27, 2019 at 5:29 PM Michael Lawrence via R-devel <r-devel at r-project.org> wrote:> > This has some nice properties: > > 1) It self-documents the input expectations in a similar manner to > colClasses. > 2) The implementation could eventually "push down" the coercion, e.g., > calling it on each chunk of an iterative read operation. > > The implementation needs work though, and I'm not convinced that coercion > failures should fallback gracefully to the default. > > Feature requests fall under a "bug" in bugzilla terminology, so please > submit this there. I think I've made you an account. > > Thanks, > Michael > > On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck < > dev.kurt at vandijck-laurijssen.be> wrote: > > > Thank you for your answers. > > I rather do not file a new bug, since what I coded isn't really a bug. > > > > The problem I (my colleagues) have today is very stupid: > > We read .csv files with a lot of columns, of which most contain > > date-time stamps, coded in DD/MM/YYYY HH:MM. > > This is not exotic, but the base library's readtable (and derivatives) > > only accept date-times in a limited number of possible formats (which I > > understand very well). > > > > We could specify a format in a rather complicated format, for each > > column individually, but this syntax is rather difficult to maintain. > > > > My solution to this specific problem became trivial, yet generic > > extension to read.table. > > Rather than relying on the built-in type detection, I added a parameter > > to a function that will be called for each to-be-type-probed column so I > > can overrule the built-in limited default. > > If nothing returns from the function, the built-in default is still > > used. > > > > This way, I could construct a type-probing function that is > > straight-forward, not hard to code, and makes reading my .csv files > > acceptible in terms of code (read.table parameters). > > > > I'm sure I'm not the only one dealing with such needs, escpecially > > date-time formats exist in enormous amounts, but I want to stress here > > that my approach is agnostic to my specific problem. > > > > For those asking to 'show me the code', I redirect to my 2nd patch, > > where the tests have been extended with my specific problem. > > > > What are your opinions about this? > > > > Kind regards, > > Kurt > > > > ______________________________________________ > > R-devel at r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel
Hey, In the meantime, I submitted a bug. Thanks for the assistence on that.> and I'm not convinced that > coercion failures should fallback gracefully to the default.the gracefull fallback: - makes the code more complex + keeps colConvert implementations limited + requires the user to only implement what changed from the default + seemed to me to smallest overall effort In my opinion, gracefull fallback makes the thing better, but without it, the colConvert parameter remains usefull, it would still fill a gap.> The implementation needs work though,Other than to remove the gracefull fallback? Kind regards, Kurt On wo, 27 mrt 2019 14:28:25 -0700, Michael Lawrence wrote:> This has some nice properties: > 1) It self-documents the input expectations in a similar manner to > colClasses. > 2) The implementation could eventually "push down" the coercion, e.g., > calling it on each chunk of an iterative read operation. > The implementation needs work though, and I'm not convinced that > coercion failures should fallback gracefully to the default. > Feature requests fall under a "bug" in bugzilla terminology, so please > submit this there. I think I've made you an account. > Thanks, > Michael > > On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck > <[1]dev.kurt at vandijck-laurijssen.be> wrote: > > Thank you for your answers. > I rather do not file a new bug, since what I coded isn't really a > bug. > The problem I (my colleagues) have today is very stupid: > We read .csv files with a lot of columns, of which most contain > date-time stamps, coded in DD/MM/YYYY HH:MM. > This is not exotic, but the base library's readtable (and > derivatives) > only accept date-times in a limited number of possible formats > (which I > understand very well). > We could specify a format in a rather complicated format, for each > column individually, but this syntax is rather difficult to > maintain. > My solution to this specific problem became trivial, yet generic > extension to read.table. > Rather than relying on the built-in type detection, I added a > parameter > to a function that will be called for each to-be-type-probed column > so I > can overrule the built-in limited default. > If nothing returns from the function, the built-in default is still > used. > This way, I could construct a type-probing function that is > straight-forward, not hard to code, and makes reading my .csv files > acceptible in terms of code (read.table parameters). > I'm sure I'm not the only one dealing with such needs, escpecially > date-time formats exist in enormous amounts, but I want to stress > here > that my approach is agnostic to my specific problem. > For those asking to 'show me the code', I redirect to my 2nd patch, > where the tests have been extended with my specific problem. > What are your opinions about this? > Kind regards, > Kurt
Kurt, Cool idea and great "seeing new faces" on here proposing things on here and engaging with R-core on here. Some comments on the issue of fallbacks below. On Wed, Mar 27, 2019 at 10:33 PM Kurt Van Dijck < dev.kurt at vandijck-laurijssen.be> wrote:> Hey, > > In the meantime, I submitted a bug. Thanks for the assistence on that. > > > and I'm not convinced that > > coercion failures should fallback gracefully to the default. > > the gracefull fallback: > - makes the code more complex > + keeps colConvert implementations limited > + requires the user to only implement what changed from the default > + seemed to me to smallest overall effort > > In my opinion, gracefull fallback makes the thing better, > but without it, the colConvert parameter remains usefull, it would still > fill a gap. >Another way of viewing coercion failure, I think, is that either the user-supplied converter has a bug in it or was mistakenly applied in a situation where it shouldn't have been. If thats the case the fail early and loud paradigm might ultimately be more helpful to users there. Another thought in the same vein is that if fallback occurs, the returned result will not be what the user asked for and is expecting. So either their code which assumes (e.g., that a column has correctly parsed as a date) is going to break in mysterious (to them) ways, or they have to put a bunch of their own checking logic after the call to see if their converters actually worked in order to protect themselves from that. Neither really seems ideal to me; I think an error would be better, myself. I'm more of a software developer than a script writer/analyst though, so its possible others' opinions would differ (though I'd be a bit surprised by that in this particular case given the danger). Best, ~G> ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]