thr3ads.net - R help - [R] write.table very slow [Dec 2001]

If this information is useful, please help other people find it:
Share via:

Cole Harris

2001-Dec-05 22:58 UTC

[R] write.table very slow

When writing tables with a large number of columns, write.table() seems to take
way too much time - e.g. a table with ~80 rows and ~6000 columns takes ~30 min
cpu on my 900 MHz pc.
I would appreciate any explainations or advice.

Thanks,
Cole
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Ott Toomet

2001-Dec-06 07:31 UTC

head link

[R] write.table very slow

Hi,

I think the problem lies in the code of write.table().  It is essentially a
paste() function, which pastes all the data in the table into a long
character string and thereafter writes the string into file.  I was not able
to write a dataset of 7500 obs times 1200 variables at all, I had to split
it up into smaller units and write those separately.  In addition caused it
much swapping on my 128MB system.

I think (I have not tried) it could work faster in your case if you just
save the observatons separately into separate files and thereafter merge the
files (but it is worth of doing only if you have to write the table
repeatedly, of course).  In long run I think a rewrite of the write.table()
in C in such a way that it do not store the whole file in memory may be a
solution.

Regards,

Ott Toomet
-------------------------------
On Wed, 5 Dec 2001, Cole Harris wrote:
> When writing tables with a large number of columns, write.table() seems to
take way too much time - e.g. a table with ~80 rows and ~6000 columns takes ~30
min cpu on my 900 MHz pc.
> I would appreciate any explainations or advice.
>
> Thanks,
> Cole
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

David Brahm

2001-Dec-06 15:15 UTC

head link

[R] write.table very slow

Cole Harris <coleh at quasarintl.com> writes:> When writing tables with a large number of columns, write.table() seems to
> take way too much time...
I tackled this problem once in S-Plus, but I have not tested the following code
thoroughly in R.  Please give it a try and let me know if it helps!  It mimics
the behavior of:
      write.table(tbl, file, quote=F, sep="\t", row.names=T)
but writes the output in "blocks", where the block size (in rows) is
set by
parameter "bsize".  Try bsize=1 to write one row at a time, and set
verbose=T
to watch its progress.


g.output <- function(tbl, file="", append=F, hdr=T,
sep="\t",
                     digits=NULL, verbose=F, bsize=7e4/length(tbl)) {
  if (is.numeric(digits))
    digits <- structure(as.list(rep(digits, , length(tbl))),
names=names(tbl))
  for (i in names(digits)) if (is.numeric(tbl[[i]]))
    tbl[[i]] <- as.character(round(tbl[[i]], digits[[i]]))
  if (!append) unlink(file)
  if (hdr && (!append || !file.exists(file)))                     #
Header line
    cat(paste(names(tbl), collapse=sep), sep="\n", file=file)

  if (!(nt <- length(tbl[[1]]))) return(invisible())
  ix <- c(seq(1, nt, by=round(bsize)), nt+1)
  cfun <- function(tbl, i1, i2, nt, file, sep, verbose) {
    if (verbose) cat("From", i1, "to", i2, date(),
"\n")
    if (i1 != 1 || i2 != nt) tbl <- g.subset(tbl, i1:i2)    # g.subset is
below
    y <- do.call("paste", c(tbl, list(sep=sep)))
    cat(y, sep="\n", file=file, append=(file != ""))
  }
  for (i in seq(ix)[-1]) cfun(tbl, ix[i-1], ix[i]-1, nt, file, sep, verbose)
}

g.subset <- function(x, q=T, reverse=F) {
  y <- list()
  test <- is.na(seq(along=x[[1]])[q])  # give "" for NA subsets of
char vectors
  f <- function(z) if (is.character(z)) ifelse(test,"",z[q]) else
z[q]
  for (j in seq(x)) y[[j]] <- if (reverse) rev(f(x[[j]])) else f(x[[j]])
  names(y) <- names(x)
  if (is.data.frame(x)) data.frame(y) else y
}

-- 
                              -- David Brahm (brahm at alum.mit.edu)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Cole Harris

2001-Dec-06 17:49 UTC

head link

[R] write.table very slow

Thanks to the responders,

I found that cat is suitable for my purposes - the following function is ~100x
faster than write.table for my particular problem - writing gene expression csv
files.

makecsv<-function(nms,cls,incl,dat,file=""){

	nrow<-length(cls)
	for(i in 1:nrow){
		cat(nms[i],cls[i],incl[i],dat[i,],sep=", ",append=TRUE,file=file)
		write("",file=file,append=TRUE)
		print(i)}
	}

Cole
>>> David Brahm  <brahm at alum.mit.edu> 12/06/01 08:15AM
>>>
Cole Harris <coleh at quasarintl.com> writes:> When writing tables with a large number of columns, write.table() seems to
> take way too much time...
I tackled this problem once in S-Plus, but I have not tested the following code
thoroughly in R.  Please give it a try and let me know if it helps!  It mimics
the behavior of:
      write.table(tbl, file, quote=F, sep="\t", row.names=T)
but writes the output in "blocks", where the block size (in rows) is
set by
parameter "bsize".  Try bsize=1 to write one row at a time, and set
verbose=T
to watch its progress.


g.output <- function(tbl, file="", append=F, hdr=T,
sep="\t",
                     digits=NULL, verbose=F, bsize=7e4/length(tbl)) {
  if (is.numeric(digits))
    digits <- structure(as.list(rep(digits, , length(tbl))),
names=names(tbl))
  for (i in names(digits)) if (is.numeric(tbl[[i]]))
    tbl[[i]] <- as.character(round(tbl[[i]], digits[[i]]))
  if (!append) unlink(file)
  if (hdr && (!append || !file.exists(file)))                     #
Header line
    cat(paste(names(tbl), collapse=sep), sep="\n", file=file)

  if (!(nt <- length(tbl[[1]]))) return(invisible())
  ix <- c(seq(1, nt, by=round(bsize)), nt+1)
  cfun <- function(tbl, i1, i2, nt, file, sep, verbose) {
    if (verbose) cat("From", i1, "to", i2, date(),
"\n")
    if (i1 != 1 || i2 != nt) tbl <- g.subset(tbl, i1:i2)    # g.subset is
below
    y <- do.call("paste", c(tbl, list(sep=sep)))
    cat(y, sep="\n", file=file, append=(file != ""))
  }
  for (i in seq(ix)[-1]) cfun(tbl, ix[i-1], ix[i]-1, nt, file, sep, verbose)
}

g.subset <- function(x, q=T, reverse=F) {
  y <- list()
  test <- is.na(seq(along=x[[1]])[q])  # give "" for NA subsets of
char vectors
  f <- function(z) if (is.character(z)) ifelse(test,"",z[q]) else
z[q]
  for (j in seq(x)) y[[j]] <- if (reverse) rev(f(x[[j]])) else f(x[[j]])
  names(y) <- names(x)
  if (is.data.frame(x)) data.frame(y) else y
}

-- 
                              -- David Brahm (brahm at alum.mit.edu)
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html 
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

R help - Dec 2001 - write.table very slow

[R] write.table very slow

[R] write.table very slow

[R] write.table very slow

[R] write.table very slow