Jens Oehlschlägel-Akiyoshi
2000-Jan-12 17:03 UTC
functions for flat file import/export + utilities
Dear R-Developers, please find attached a set of drafted functions for flat file import and export, partially extending existing functions, partially completely written as new code. I thought you might be interested in those functions and the accompanying utilities for padding and trimming. Main features are - supports several formats, i.e. fixed width and CSV (with one exception) - supports templates describing relevant features of the flat file data, including attributes of data.frame columns, storage.mode etc. - special handling of factors and logicals - handling of special characters like decimal seperator and escaped quotation mark - no perl and no seperator needed for fixed width format (and no problems with quotation marks) - reading fixed width format is based on simple function read.char.cols(), which can be implemented more memory efficient in C However, these functions are written from the perspective of a non-R-developer, of course one could do it better. Also I don't know if it is a good idea traing to keep some compatibility (and naming) to read.table() and write.table(), or if it is better to optimize new solutions with new naming. I have done some testing, but of course there may be bugs in it. Having said this I would appreciate any feedback, criticisms, suggestions for improvements, ... Regards -- Dr. Jens Oehlschlägel-Akiyoshi MD FACTORY GmbH Bayerstrasse 21 80335 München Tel.: 089 545 28-27 Fax.: 089 545 28-10 http://www.mdfactory.de Standard Disclaimers: Opinions expressed here are personal and are not otherwise represented.
Dear R-Developers, Would it be possible / is it desirable to alter tabulate so that the lower boundary can be set in addition to the upper boundary? eg, tabulate(bin, low=min(1,bin), nbin=max(1,bin)) would silently ignore elements outside the range low..nbin It is easy to get around this at present, of course, but seems natural to have a more flexible function. [or should I be using another function?] David. -- David Wooff, Director, Statistics and Mathematics Consultancy Unit, Department of Mathematical Sciences, University of Durham. Science Laboratories, South Road, Durham, DH1 3LE, UK. Tel. 0191 374 4531, Fax 0191 374 7388. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
> Dear R-Developers, > > Would it be possible / is it desirable to alter tabulate > so that the lower boundary > can be set in addition to the upper boundary? eg, > > tabulate(bin, low=min(1,bin), nbin=max(1,bin)) > > would silently ignore elements outside the range low..nbin > > It is easy to get around this at present, of course, > but seems natural to have a more flexible function. > [or should I be using another function?]I would oppose it on the grounds that tabulate() is a low-level function and making a change like this is likely to slow it down (if marginally) and to hinder porting software from R to S, which I regard as an important consideration, if not an over-riding one. table() should be the usual function for everyday use (and using this makes the change to tabulate() unnecessary), but on R the difference in speed between table() and tabulate() can be dramatic. I would suggest improving the efficiency of table() should be a fairly high priority. Bill. -- ----------------------------------------------------------------- Bill Venables, Statistician, CMIS Environmetrics Project. Physical address: Postal address: CSIRO Marine Laboratories, PO Box 120, 233 Middle St, Cleveland, Qld, 4163 Cleveland, Qld, 4163 AUSTRALIA AUSTRALIA Tel: +61 7 3826 7251 Email: Bill.Venables@cmis.csiro.au Fax: +61 7 3826 7304 http://www.cmis.csiro.au/bill.venables/ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Bill Venables wrote:> > > Dear R-Developers, > > > > Would it be possible / is it desirable to alter tabulate > > so that the lower boundary > > can be set in addition to the upper boundary? eg, > > > > tabulate(bin, low=min(1,bin), nbin=max(1,bin)) > > > > would silently ignore elements outside the range low..nbin > > > > It is easy to get around this at present, of course, > > but seems natural to have a more flexible function. > > [or should I be using another function?] > > I would oppose it on the grounds that tabulate() is a low-level > function and making a change like this is likely to slow it down > (if marginally) and to hinder porting software from R to S, which > I regard as an important consideration, if not an over-riding one. > > table() should be the usual function for everyday use (and using > this makes the change to tabulate() unnecessary), but on R the > difference in speed between table() and tabulate() can be > dramatic. I would suggest improving the efficiency of table() > should be a fairly high priority. >As far as I know, tabulate appears to have the advantage [for some of my tasks] of not excluding bins with zero counts. Otherwise, I would use table. David. -- David Wooff, Director, Statistics and Mathematics Consultancy Unit, Department of Mathematical Sciences, University of Durham. Science Laboratories, South Road, Durham, DH1 3LE, UK. Tel. 0191 374 4531, Fax 0191 374 7388. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._