Ben Bolker
2019-Mar-26 20:26 UTC
[Rd] [PATCH 1/2] readtable: add hook for type conversions per column
You need admin assistance, someone will probably see your request here and fulfill it. It might be helpful to read this question/answer on StackOverflow discussing the context of proposing patches to base R functionality ... https://stackoverflow.com/questions/8065835/proposing-feature-requests-to-the-r-core-team cheers Ben Bolker On 2019-03-26 4:20 p.m., Kurt Van Dijck wrote:> On di, 26 mrt 2019 12:48:12 -0700, Michael Lawrence wrote: >> Please file a bug on bugzilla so we can discuss this further. > > All fine. > I didn't find a way to create an account on bugs.r-project.org. > Did I just not see it? or do I need administrator assistance? > > Kind regards, > Kurt > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >
Thank you for your answers. I rather do not file a new bug, since what I coded isn't really a bug. The problem I (my colleagues) have today is very stupid: We read .csv files with a lot of columns, of which most contain date-time stamps, coded in DD/MM/YYYY HH:MM. This is not exotic, but the base library's readtable (and derivatives) only accept date-times in a limited number of possible formats (which I understand very well). We could specify a format in a rather complicated format, for each column individually, but this syntax is rather difficult to maintain. My solution to this specific problem became trivial, yet generic extension to read.table. Rather than relying on the built-in type detection, I added a parameter to a function that will be called for each to-be-type-probed column so I can overrule the built-in limited default. If nothing returns from the function, the built-in default is still used. This way, I could construct a type-probing function that is straight-forward, not hard to code, and makes reading my .csv files acceptible in terms of code (read.table parameters). I'm sure I'm not the only one dealing with such needs, escpecially date-time formats exist in enormous amounts, but I want to stress here that my approach is agnostic to my specific problem. For those asking to 'show me the code', I redirect to my 2nd patch, where the tests have been extended with my specific problem. What are your opinions about this? Kind regards, Kurt
This has some nice properties: 1) It self-documents the input expectations in a similar manner to colClasses. 2) The implementation could eventually "push down" the coercion, e.g., calling it on each chunk of an iterative read operation. The implementation needs work though, and I'm not convinced that coercion failures should fallback gracefully to the default. Feature requests fall under a "bug" in bugzilla terminology, so please submit this there. I think I've made you an account. Thanks, Michael On Wed, Mar 27, 2019 at 1:19 PM Kurt Van Dijck < dev.kurt at vandijck-laurijssen.be> wrote:> Thank you for your answers. > I rather do not file a new bug, since what I coded isn't really a bug. > > The problem I (my colleagues) have today is very stupid: > We read .csv files with a lot of columns, of which most contain > date-time stamps, coded in DD/MM/YYYY HH:MM. > This is not exotic, but the base library's readtable (and derivatives) > only accept date-times in a limited number of possible formats (which I > understand very well). > > We could specify a format in a rather complicated format, for each > column individually, but this syntax is rather difficult to maintain. > > My solution to this specific problem became trivial, yet generic > extension to read.table. > Rather than relying on the built-in type detection, I added a parameter > to a function that will be called for each to-be-type-probed column so I > can overrule the built-in limited default. > If nothing returns from the function, the built-in default is still > used. > > This way, I could construct a type-probing function that is > straight-forward, not hard to code, and makes reading my .csv files > acceptible in terms of code (read.table parameters). > > I'm sure I'm not the only one dealing with such needs, escpecially > date-time formats exist in enormous amounts, but I want to stress here > that my approach is agnostic to my specific problem. > > For those asking to 'show me the code', I redirect to my 2nd patch, > where the tests have been extended with my specific problem. > > What are your opinions about this? > > Kind regards, > Kurt > > ______________________________________________ > R-devel at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel >[[alternative HTML version deleted]]