thr3ads.net - R help - [R] Reading and coalescing many datafiles. [Apr 2005]

If this information is useful, please help other people find it:
Share via:

asr@ufl.edu

2005-Apr-14 15:48 UTC

[R] Reading and coalescing many datafiles.

Greetings.


I've got some analysis problems I'm trying to solve, the raw data for
which
are accumulated in a bunch of time-and-date-based files.

/some/path/2005-01-02-00-00-02

etc.


The best 'read all these files' method I've seen in the r-help
archives comes
down to 

for (df in my_list_of_filenames )
    {
          dat <- rbind(dat,my_read_function(df))
    } 

which, unpleasantly, is O(N^2) w.r.t. the number of files.

I'm fiddling with other idioms to accomplish the same goal.  Best I've
come up
with so far, after extensive reference to the mailing list archives, is


my_read_function.many<-function(filenames)
  {
    filenames <- filenames[file.exists(filenames)];
    rv <- do.call("rbind", lapply(filenames,my_read_function))
    row.names(rv) = c(1:length(row.names(rv)))
    rv
  }


I'd love to have some stupid omission pointed out.


- Allen S. Rout

Roger D. Peng

2005-Apr-14 16:20 UTC

head link

[R] Reading and coalescing many datafiles.

In my experience, using 'do.call("rbind", ...)' after storing
all the
data files in a list is much better than 'rbind'-ing on the fly.

-roger

asr at ufl.edu wrote:> Greetings.
> 
> 
> I've got some analysis problems I'm trying to solve, the raw data
for which
> are accumulated in a bunch of time-and-date-based files.
> 
> /some/path/2005-01-02-00-00-02
> 
> etc.
> 
> 
> The best 'read all these files' method I've seen in the r-help
archives comes
> down to 
> 
> for (df in my_list_of_filenames )
>     {
>           dat <- rbind(dat,my_read_function(df))
>     } 
> 
> which, unpleasantly, is O(N^2) w.r.t. the number of files.
> 
> I'm fiddling with other idioms to accomplish the same goal.  Best
I've come up
> with so far, after extensive reference to the mailing list archives, is
> 
> 
> my_read_function.many<-function(filenames)
>   {
>     filenames <- filenames[file.exists(filenames)];
>     rv <- do.call("rbind", lapply(filenames,my_read_function))
>     row.names(rv) = c(1:length(row.names(rv)))
>     rv
>   }
> 
> 
> I'd love to have some stupid omission pointed out.
> 
> 
> - Allen S. Rout
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html
> 
-- 
Roger D. Peng
http://www.biostat.jhsph.edu/~rpeng/

Peter Dalgaard

2005-Apr-14 16:23 UTC

head link

[R] Reading and coalescing many datafiles.

asr at ufl.edu writes:
> Greetings.
> 
> 
> I've got some analysis problems I'm trying to solve, the raw data
for which
> are accumulated in a bunch of time-and-date-based files.
> 
> /some/path/2005-01-02-00-00-02
> 
> etc.
> 
> 
> The best 'read all these files' method I've seen in the r-help
archives comes
> down to 
> 
> for (df in my_list_of_filenames )
>     {
>           dat <- rbind(dat,my_read_function(df))
>     } 
> 
> which, unpleasantly, is O(N^2) w.r.t. the number of files.
> 
> I'm fiddling with other idioms to accomplish the same goal.  Best
I've come up
> with so far, after extensive reference to the mailing list archives, is
> 
> 
> my_read_function.many<-function(filenames)
>   {
>     filenames <- filenames[file.exists(filenames)];
>     rv <- do.call("rbind", lapply(filenames,my_read_function))
>     row.names(rv) = c(1:length(row.names(rv)))
>     rv
>   }
> 
> 
> I'd love to have some stupid omission pointed out.

Why? It's pretty much what I would suggest, except for the superfluous
c().

-- 
   O__  ---- Peter Dalgaard             Blegdamsvej 3  
  c/ /'_ --- Dept. of Biostatistics     2200 Cph. N   
 (*) \(*) -- University of Copenhagen   Denmark      Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk)             FAX: (+45) 35327907

Maybe Matching Threads

Search for more maybe matching threads

R help - Apr 2005 - Reading and coalescing many datafiles.

[R] Reading and coalescing many datafiles.

[R] Reading and coalescing many datafiles.

[R] Reading and coalescing many datafiles.

Maybe Matching Threads