Check out read.csv.sql in the sqldf package. It reads a file directly
into sqlite without going through R and then from there into R. It
sets up the database and file layouts in the database for you and also
destroys the database when finished so reading is just a matter of one
line of R code. It also has the capability of reading any portion of
the file that can be specified in sql. See examples on home page:
http://sqldf.googlecode.com
On Tue, Mar 16, 2010 at 12:51 PM, Joe Calderon <calderon.joe at gmail.com>
wrote:> hello *, im running into two major bottlenecks an R script.
>
> 1. going through a 40mb file and reading in via readLines() 1 line at
> a time is almost an order of magnitude slow than the equivalent in
> python, im wondering if there are alternatives to readLines(), doing
> more lines at a time helps a bit
>
> 2. generating date sequences takes a long time, im basically doing
> something like seq.Date(Sys.Date(), length.out = 300, by ='day') a
lot
> while digging into it, i strace'd the running process and it seems the
> bulk of the time is spent checking for /etc/localtime
>
> stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...})
= 0
>
>
> strace -cp 2964
> Process 2964 attached - interrupt to quit
> ^CProcess 2964 detached
> % time ? ? seconds ?usecs/call ? ? calls ? ?errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> ?94.61 ? ?0.006387 ? ? ? ? ? 0 ? ? 55872 ? ? ? ? ? stat
> ?2.58 ? ?0.000174 ? ? ? ? ? 0 ? ? ? 568 ? ? ? ? ? read
> ?1.42 ? ?0.000096 ? ? ? ? ? 0 ? ? ? 285 ? ? ? ? ? write
> ?1.39 ? ?0.000094 ? ? ? ? ? 1 ? ? ? 137 ? ? ? ? ? brk
> ------ ----------- ----------- --------- --------- ----------------
> 100.00 ? ?0.006751 ? ? ? ? ? ? ? ? 56862 ? ? ? ? ? total
>
>
>
> has anybody ran into similar problems?
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>