Greetings. I've got some analysis problems I'm trying to solve, the raw data for which are accumulated in a bunch of time-and-date-based files. /some/path/2005-01-02-00-00-02 etc. The best 'read all these files' method I've seen in the r-help archives comes down to for (df in my_list_of_filenames ) { dat <- rbind(dat,my_read_function(df)) } which, unpleasantly, is O(N^2) w.r.t. the number of files. I'm fiddling with other idioms to accomplish the same goal. Best I've come up with so far, after extensive reference to the mailing list archives, is my_read_function.many<-function(filenames) { filenames <- filenames[file.exists(filenames)]; rv <- do.call("rbind", lapply(filenames,my_read_function)) row.names(rv) = c(1:length(row.names(rv))) rv } I'd love to have some stupid omission pointed out. - Allen S. Rout
In my experience, using 'do.call("rbind", ...)' after storing all the data files in a list is much better than 'rbind'-ing on the fly. -roger asr at ufl.edu wrote:> Greetings. > > > I've got some analysis problems I'm trying to solve, the raw data for which > are accumulated in a bunch of time-and-date-based files. > > /some/path/2005-01-02-00-00-02 > > etc. > > > The best 'read all these files' method I've seen in the r-help archives comes > down to > > for (df in my_list_of_filenames ) > { > dat <- rbind(dat,my_read_function(df)) > } > > which, unpleasantly, is O(N^2) w.r.t. the number of files. > > I'm fiddling with other idioms to accomplish the same goal. Best I've come up > with so far, after extensive reference to the mailing list archives, is > > > my_read_function.many<-function(filenames) > { > filenames <- filenames[file.exists(filenames)]; > rv <- do.call("rbind", lapply(filenames,my_read_function)) > row.names(rv) = c(1:length(row.names(rv))) > rv > } > > > I'd love to have some stupid omission pointed out. > > > - Allen S. Rout > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Roger D. Peng http://www.biostat.jhsph.edu/~rpeng/
asr at ufl.edu writes:> Greetings. > > > I've got some analysis problems I'm trying to solve, the raw data for which > are accumulated in a bunch of time-and-date-based files. > > /some/path/2005-01-02-00-00-02 > > etc. > > > The best 'read all these files' method I've seen in the r-help archives comes > down to > > for (df in my_list_of_filenames ) > { > dat <- rbind(dat,my_read_function(df)) > } > > which, unpleasantly, is O(N^2) w.r.t. the number of files. > > I'm fiddling with other idioms to accomplish the same goal. Best I've come up > with so far, after extensive reference to the mailing list archives, is > > > my_read_function.many<-function(filenames) > { > filenames <- filenames[file.exists(filenames)]; > rv <- do.call("rbind", lapply(filenames,my_read_function)) > row.names(rv) = c(1:length(row.names(rv))) > rv > } > > > I'd love to have some stupid omission pointed out.Why? It's pretty much what I would suggest, except for the superfluous c(). -- O__ ---- Peter Dalgaard Blegdamsvej 3 c/ /'_ --- Dept. of Biostatistics 2200 Cph. N (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
Maybe Matching Threads
- [PATCH 00/11] Add virtual EPT support Xen.
- rsync for migrating oracle datafiles
- Parsing very large xml datafiles with SAX (XML package): What data structure should I favor?
- Parsing very large xml datafiles with SAX: How to profile <anonymous> functions?
- Working with large datafiles