Hi R-users, I want to convert a large dataset (from stata format) to an ascii table. The resulting table should be a human-readable table (like CSV, or tab-separated file or something similar). R reads the stata-file quite easily (with some problems which are discussed here earlier), but so long I have not found a suitable way to write it in ascii format. Sure, there exists write.table, which is intended for that purpose. But the problem is that the table is quite a large (7500 obs x 1200 vars) and after half-an-hout processing on my PII wiht 128 MB memory, write.table stopped with a message that it could not allocate 35 MB more. dput() and save( ..., ascii=TRUE) work well but the result is not such a table I am looking for. I tried also print() with option( width=very wide) but the largest allowed with is 10000, the table should be around 30000 characters wide. When I tried to write.table() line-by line, I got quite soon a message that R was killed. I know the write.table() pastes together all the variables and writes then the resulting string to the file. Aren't there any more efficient way of doing it, I think saving the dataframe variable-by-variable using a C-function should be quite a fast and not memory hungry at all. I have R 1.3.1 on RH7.1 linux Thanks in advance, Ott Toomet -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Hello, One alternative way to do that is by processing portions of your big table. You can select, consecutive numbers of colums and rows and save them in separate files that after finishing you can recompose on Linux with cat and paste commands. It is clearly more tedious but effective and manageable. Hope it helps. Carlos Ortega. -------------------------------------------------------------------- Hi R-users, I want to convert a large dataset (from stata format) to an ascii table. The resulting table should be a human-readable table (like CSV, or tab-separated file or something similar). R reads the stata-file quite easily (with some problems which are discussed here earlier), but so long I have not found a suitable way to write it in ascii format. Sure, there exists write.table, which is intended for that purpose. But the problem is that the table is quite a large (7500 obs x 1200 vars) and after half-an-hout processing on my PII wiht 128 MB memory, write.table stopped with a message that it could not allocate 35 MB more. dput() and save( ..., ascii=TRUE) work well but the result is not such a table I am looking for. I tried also print() with option( width=very wide) but the largest allowed with is 10000, the table should be around 30000 characters wide. When I tried to write.table() line-by line, I got quite soon a message that R was killed. I know the write.table() pastes together all the variables and writes then the resulting string to the file. Aren't there any more efficient way of doing it, I think saving the dataframe variable-by-variable using a C-function should be quite a fast and not memory hungry at all. I have R 1.3.1 on RH7.1 linux Thanks in advance, Ott Toomet -------------- next part -------------- An HTML attachment was scrubbed... URL: https://stat.ethz.ch/pipermail/r-help/attachments/20011011/a5e32bed/attachment.html
Ott Toomet wrote:> I want to convert a large dataset (from stata format) to an ascii table. > ... after half-an-hour processing on my PII wiht 128 MB memory, write.table > stopped with a message that it could not allocate 35 MB more.The following function has only been fully tested in S-Plus on Unix, but may be of help. It is somewhat analogous to write.table(tbl, file, quote=F, sep="\t", row.names=T) The main point is that it writes output in blocks of rows; you can choose the block size with argument "bsize". There is an also optional argument "digits", a list with the same names as "tbl", which determines how many decimal places each column is rounded to. ##### Begin code: ##### g.output <- function(tbl, file="", append=F, hdr=T, sep="\t", digits=NULL, verbose=F, bsize=7e4/length(tbl)) { if (is.numeric(digits)) digits <- structure(as.list(rep(digits, length(tbl))), names=names(tbl)) for (i in names(digits)) if (is.numeric(tbl[[i]])) tbl[[i]] <- as.character(round(tbl[[i]], digits[[i]])) if (!append) unlink(file) if (hdr && (!append || !file.exists(file))) # Header line cat(paste(names(tbl), collapse=sep), sep="\n", file=file) if (!(nt <- length(tbl[[1]]))) return(invisible()) ix <- c(seq(1, nt, by=round(bsize)), nt+1) cfun <- function(tbl, i1, i2, nt, file, sep, verbose) { if (verbose) cat("From", i1, "to", i2, date(), "\n") if (i1 != 1 || i2 != nt) tbl <- g.subset(tbl, i1:i2) y <- do.call("paste", c(tbl, list(sep=sep))) cat(y, sep="\n", file=file, append=(file != "")) } for (i in seq(ix)[-1]) cfun(tbl, ix[i-1], ix[i]-1, nt, file, sep, verbose) } g.subset <- function(x, q=T, reverse=F) { y <- list() test <- is.na(seq(along=x[[1]])[q]) f <- function(z) if (is.character(z)) ifelse(test,"",z[q]) else z[q] for (j in seq(x)) y[[j]] <- if (reverse) rev(f(x[[j]])) else f(x[[j]]) names(y) <- names(x) if (is.data.frame(x)) data.frame(y) else y } ##### End code ##### -- David Brahm (brahm at alum.mit.edu) -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._