When writing tables with a large number of columns, write.table() seems to take way too much time - e.g. a table with ~80 rows and ~6000 columns takes ~30 min cpu on my 900 MHz pc. I would appreciate any explainations or advice. Thanks, Cole -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hi, I think the problem lies in the code of write.table(). It is essentially a paste() function, which pastes all the data in the table into a long character string and thereafter writes the string into file. I was not able to write a dataset of 7500 obs times 1200 variables at all, I had to split it up into smaller units and write those separately. In addition caused it much swapping on my 128MB system. I think (I have not tried) it could work faster in your case if you just save the observatons separately into separate files and thereafter merge the files (but it is worth of doing only if you have to write the table repeatedly, of course). In long run I think a rewrite of the write.table() in C in such a way that it do not store the whole file in memory may be a solution. Regards, Ott Toomet ------------------------------- On Wed, 5 Dec 2001, Cole Harris wrote:> When writing tables with a large number of columns, write.table() seems to take way too much time - e.g. a table with ~80 rows and ~6000 columns takes ~30 min cpu on my 900 MHz pc. > I would appreciate any explainations or advice. > > Thanks, > Cole > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Cole Harris <coleh at quasarintl.com> writes:> When writing tables with a large number of columns, write.table() seems to > take way too much time...I tackled this problem once in S-Plus, but I have not tested the following code thoroughly in R. Please give it a try and let me know if it helps! It mimics the behavior of: write.table(tbl, file, quote=F, sep="\t", row.names=T) but writes the output in "blocks", where the block size (in rows) is set by parameter "bsize". Try bsize=1 to write one row at a time, and set verbose=T to watch its progress. g.output <- function(tbl, file="", append=F, hdr=T, sep="\t", digits=NULL, verbose=F, bsize=7e4/length(tbl)) { if (is.numeric(digits)) digits <- structure(as.list(rep(digits, , length(tbl))), names=names(tbl)) for (i in names(digits)) if (is.numeric(tbl[[i]])) tbl[[i]] <- as.character(round(tbl[[i]], digits[[i]])) if (!append) unlink(file) if (hdr && (!append || !file.exists(file))) # Header line cat(paste(names(tbl), collapse=sep), sep="\n", file=file) if (!(nt <- length(tbl[[1]]))) return(invisible()) ix <- c(seq(1, nt, by=round(bsize)), nt+1) cfun <- function(tbl, i1, i2, nt, file, sep, verbose) { if (verbose) cat("From", i1, "to", i2, date(), "\n") if (i1 != 1 || i2 != nt) tbl <- g.subset(tbl, i1:i2) # g.subset is below y <- do.call("paste", c(tbl, list(sep=sep))) cat(y, sep="\n", file=file, append=(file != "")) } for (i in seq(ix)[-1]) cfun(tbl, ix[i-1], ix[i]-1, nt, file, sep, verbose) } g.subset <- function(x, q=T, reverse=F) { y <- list() test <- is.na(seq(along=x[[1]])[q]) # give "" for NA subsets of char vectors f <- function(z) if (is.character(z)) ifelse(test,"",z[q]) else z[q] for (j in seq(x)) y[[j]] <- if (reverse) rev(f(x[[j]])) else f(x[[j]]) names(y) <- names(x) if (is.data.frame(x)) data.frame(y) else y } -- -- David Brahm (brahm at alum.mit.edu) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thanks to the responders, I found that cat is suitable for my purposes - the following function is ~100x faster than write.table for my particular problem - writing gene expression csv files. makecsv<-function(nms,cls,incl,dat,file=""){ nrow<-length(cls) for(i in 1:nrow){ cat(nms[i],cls[i],incl[i],dat[i,],sep=", ",append=TRUE,file=file) write("",file=file,append=TRUE) print(i)} } Cole>>> David Brahm <brahm at alum.mit.edu> 12/06/01 08:15AM >>>Cole Harris <coleh at quasarintl.com> writes:> When writing tables with a large number of columns, write.table() seems to > take way too much time...I tackled this problem once in S-Plus, but I have not tested the following code thoroughly in R. Please give it a try and let me know if it helps! It mimics the behavior of: write.table(tbl, file, quote=F, sep="\t", row.names=T) but writes the output in "blocks", where the block size (in rows) is set by parameter "bsize". Try bsize=1 to write one row at a time, and set verbose=T to watch its progress. g.output <- function(tbl, file="", append=F, hdr=T, sep="\t", digits=NULL, verbose=F, bsize=7e4/length(tbl)) { if (is.numeric(digits)) digits <- structure(as.list(rep(digits, , length(tbl))), names=names(tbl)) for (i in names(digits)) if (is.numeric(tbl[[i]])) tbl[[i]] <- as.character(round(tbl[[i]], digits[[i]])) if (!append) unlink(file) if (hdr && (!append || !file.exists(file))) # Header line cat(paste(names(tbl), collapse=sep), sep="\n", file=file) if (!(nt <- length(tbl[[1]]))) return(invisible()) ix <- c(seq(1, nt, by=round(bsize)), nt+1) cfun <- function(tbl, i1, i2, nt, file, sep, verbose) { if (verbose) cat("From", i1, "to", i2, date(), "\n") if (i1 != 1 || i2 != nt) tbl <- g.subset(tbl, i1:i2) # g.subset is below y <- do.call("paste", c(tbl, list(sep=sep))) cat(y, sep="\n", file=file, append=(file != "")) } for (i in seq(ix)[-1]) cfun(tbl, ix[i-1], ix[i]-1, nt, file, sep, verbose) } g.subset <- function(x, q=T, reverse=F) { y <- list() test <- is.na(seq(along=x[[1]])[q]) # give "" for NA subsets of char vectors f <- function(z) if (is.character(z)) ifelse(test,"",z[q]) else z[q] for (j in seq(x)) y[[j]] <- if (reverse) rev(f(x[[j]])) else f(x[[j]]) names(y) <- names(x) if (is.data.frame(x)) data.frame(y) else y } -- -- David Brahm (brahm at alum.mit.edu) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._