thr3ads.net - R devel - [Rd] Importing csv files [Dec 2004]

If this information is useful, please help other people find it:
Share via:

Frank E Harrell Jr

2004-Dec-23 14:57 UTC

[Rd] Importing csv files

There is a recurring need for importing large csv files quickly.  David 
Baird's dataload is a standalone program that will directly create .rda 
files from .csv (it also handles many other conversions).  Unfortunately 
dataload is no longer publicly available because of some kind of 
relationship with Stat/Transfer.  The idea is a good one, though.  I 
wonder if anyone would volunteer to replicate the csv->rda standalone 
functionality or to provide some Perl or Python tools for making 
creation of .rda files somewhat easy outside of R.

As an aside, I routinely see 30-fold reductions in file sizes for .rda 
files (made with save(..., compress=TRUE)) compared with the size of SAS 
binary datasets.  And load( ) times are fast.

It's been a great year for R.  Let me take this opportunity to thank the 
R leaders for a fantastic job that gives immeasurable benefits to the 
community.
-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Prof Brian Ripley

2004-Dec-23 16:16 UTC

head link

[Rd] Importing csv files

I think we need to know what you mean by `large' and why read.table is 
not fast enough (and hence if some of the planned improvements might be 
all that is needed).

Could you make some examples available for profiling?

It seems to me that there are some delicate licensing issues in 
distributing a product that writes .rda format except under GPL. See, for 
example, the GPL FAQ.

On Thu, 23 Dec 2004, Frank E Harrell Jr wrote:
> There is a recurring need for importing large csv files quickly.  David 
> Baird's dataload is a standalone program that will directly create .rda
files
> from .csv (it also handles many other conversions).  Unfortunately dataload
> is no longer publicly available because of some kind of relationship with 
> Stat/Transfer.  The idea is a good one, though.  I wonder if anyone would 
> volunteer to replicate the csv->rda standalone functionality or to
provide
> some Perl or Python tools for making creation of .rda files somewhat easy 
> outside of R.
>
> As an aside, I routinely see 30-fold reductions in file sizes for .rda
files
> (made with save(..., compress=TRUE)) compared with the size of SAS binary 
> datasets.  And load( ) times are fast.
>
> It's been a great year for R.  Let me take this opportunity to thank
the R
> leaders for a fantastic job that gives immeasurable benefits to the 
> community.
-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  stats.ox.ac.uk/~ripley
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Gabor Grothendieck

2004-Dec-23 23:57 UTC

head link

[Rd] Importing csv files

> 
> >
> > My understanding is that David is not distributing dataload any more,
though
> > I would not like to discourage commercial vendors (such as providers
of
> > Stat/Transfer and DBMSCOPY) from providing .rda output as an option. I
> > assume that new code written under GPL would not be a problem. -Frank
> 
> I said `except under GPL'. I am not trying to discourage anyone, just 
> pointing out that GPL has far-ranging implications that are often 
> over-looked.
> 
One way to encourage other software to provide .rda interfaces 
would be to document (or make more visible if such a 
document already exists) the C routines that read and write .rda 
files.

Possibly Parallel Threads

Search for more reasonably related threads

R devel - Dec 2004 - Importing csv files

[Rd] Importing csv files

[Rd] Importing csv files

[Rd] Importing csv files

Possibly Parallel Threads