thr3ads.net - R help - [R] efficient equivalent to read.csv / write.csv [Sep 2010]

If this information is useful, please help other people find it:
Share via:

statquant2

2010-Sep-26 12:38 UTC

[R] efficient equivalent to read.csv / write.csv

Hello everyone,
I currently run R code that have to read 100 or more large csv files (>= 100
Mo), and usually write csv too.
My collegues and I like R very much but are a little bit ashtonished by how
slow those functions are. We have looked on every argument of those
functions and if specifying some parameters help a bit, this is still too
slow.
I am sure a lot of people have the same problem so I thought one of you
would know a trick or a package that would help speeding this up a lot.

(we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this
pb)

Thanks for reading this.
Have a nice week end
-- 
View this message in context:
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2714325.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

2010-Sep-26 16:25 UTC

head link

[R] efficient equivalent to read.csv / write.csv

On Sun, Sep 26, 2010 at 8:38 AM, statquant2 <statquant at gmail.com>
wrote:>
> Hello everyone,
> I currently run R code that have to read 100 or more large csv files (>=
100
> Mo), and usually write csv too.
> My collegues and I like R very much but are a little bit ashtonished by how
> slow those functions are. We have looked on every argument of those
> functions and if specifying some parameters help a bit, this is still too
> slow.
> I am sure a lot of people have the same problem so I thought one of you
> would know a trick or a package that would help speeding this up a lot.
>
> (we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this
> pb)
>
> Thanks for reading this.
> Have a nice week end
You could try read.csv.sql in the sqldf package:

http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql

See ?read.csv.sql in sqldf.  It uses RSQLite and SQLite to read the
file into an sqlite database (which it sets up for you) completely
bypassing R and from there grabs it into R removing the database it
created at the end.

There are also CSVREAD and CSVWRITE sql functions in the H2 database
which is also supported by sqldf although I have never checked their
speed:
http://code.google.com/p/sqldf/#10.__What_are_some_of_the_differences_between_using_SQLite_and_H

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

Uwe Ligges

2010-Sep-26 16:29 UTC

head link

[R] efficient equivalent to read.csv / write.csv

On 26.09.2010 14:38, statquant2 wrote:>
> Hello everyone,
> I currently run R code that have to read 100 or more large csv files (>=
100
> Mo), and usually write csv too.
> My collegues and I like R very much but are a little bit ashtonished by how
> slow those functions are. We have looked on every argument of those
> functions and if specifying some parameters help a bit, this is still too
> slow.
> I am sure a lot of people have the same problem so I thought one of you
> would know a trick or a package that would help speeding this up a lot.
>
> (we work on LINUX Red Hat R 2.10.0 but I guess this is of no use for this
> pb)
>
> Thanks for reading this.
> Have a nice week end

Most of us read the csv file and write an Rdata file at once (see 
?save). Then we can read in the data much quicker after they have been 
imported once with read.csv and friends.

Uwe Ligges

statquant2

2010-Sep-28 17:24 UTC

head link

[R] efficient equivalent to read.csv / write.csv

Hi, after testing 
R) system.time(read.csv("myfile.csv"))
   user  system elapsed
  1.126   0.038   1.177

R) system.time(read.csv.sql("myfile.csv"))
   user  system elapsed
  1.405   0.025   1.439
Warning messages:
1: closing unused connection 4 ()
2: closing unused connection 3 ()

It seems that the function is less efficient that the base one ... so ...
-- 
View this message in context:
http://r.789695.n4.nabble.com/efficient-equivalent-to-read-csv-write-csv-tp2714325p2717585.html
Sent from the R help mailing list archive at Nabble.com.

Gabor Grothendieck

2010-Sep-29 00:03 UTC

head link

[R] efficient equivalent to read.csv / write.csv

On Tue, Sep 28, 2010 at 1:24 PM, statquant2 <statquant at gmail.com>
wrote:>
> Hi, after testing
> R) system.time(read.csv("myfile.csv"))
> ? user ?system elapsed
> ?1.126 ? 0.038 ? 1.177
>
> R) system.time(read.csv.sql("myfile.csv"))
> ? user ?system elapsed
> ?1.405 ? 0.025 ? 1.439
> Warning messages:
> 1: closing unused connection 4 ()
> 2: closing unused connection 3 ()
>
> It seems that the function is less efficient that the base one ... so ...
The benefit comes with larger files.  With small files there is not
much point in speeding it up since the absolute time is already small.

Suggest you look at the benchmarks on the sqldf home page where a
couple of users benchmarked larger files.   Since sqldf was intended
for convenience and not really performance I was surprised as anyone
when several users independently noticed that sqldf ran several times
faster than unoptimized R code.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

R help - Sep 2010 - efficient equivalent to read.csv / write.csv

[R] efficient equivalent to read.csv / write.csv

[R] efficient equivalent to read.csv / write.csv

[R] efficient equivalent to read.csv / write.csv

[R] efficient equivalent to read.csv / write.csv

[R] efficient equivalent to read.csv / write.csv