I have two one dimensional list of elements and want to perform cbind and then write into a file. The number of entries are more than a million in both lists. R is taking a lot of time performing this operation. Is there any alternate way to perform cbind? x = table1[1:1000000,1] y = table2[1:1000000,5] z = cbind(x,y) //hanging the machine write.table(z,'out.txt) -- ------------- Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]]
You could break the data into chunks, so you cbind and save 50,000 observations at a time. That should be less taxing on your machine and memory. -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Mary Kindall Sent: Friday, January 06, 2012 12:43 PM To: r-help at r-project.org Subject: [R] cbind alternate I have two one dimensional list of elements and want to perform cbind and then write into a file. The number of entries are more than a million in both lists. R is taking a lot of time performing this operation. Is there any alternate way to perform cbind? x = table1[1:1000000,1] y = table2[1:1000000,5] z = cbind(x,y) //hanging the machine write.table(z,'out.txt) -- ------------- Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. *************************************************************** This message is for the named person's use only. It may\...{{dropped:11}}
On Jan 6, 2012, at 11:43 AM, Mary Kindall wrote:> I have two one dimensional list of elements and want to perform cbind and > then write into a file. The number of entries are more than a million in > both lists. R is taking a lot of time performing this operation. > > Is there any alternate way to perform cbind? > > x = table1[1:1000000,1] > y = table2[1:1000000,5] > > z = cbind(x,y) //hanging the machine > > write.table(z,'out.txt)The issue is not the use of cbind(), but that write.table() can be slow with data frames, where each column may be a different class (data type) and requires separate formatting for output. This is referenced in the Note section of ?write.table: write.table can be slow for data frames with large numbers (hundreds or more) of columns: this is inevitable as each column could be of a different class and so must be handled separately. If they are all of the same class, consider using a matrix instead. I suspect in this case, while you don't have a large number of columns, you do have a large number of rows, so that there is a tradeoff. If all of the columns in your source tables are of the same type (eg. all numeric), coerce 'z' to a matrix and then try using write.table(). z <- matrix(rnorm(1000000 * 6), ncol = 6)> str(z)num [1:1000000, 1:6] -0.713 0.79 -0.538 0.945 1.621 ...> system.time(write.table(z, file = "test.txt"))user system elapsed 12.664 0.292 13.029 The resultant file is about 118 Mb on my system. HTH, Marc Schwartz
On Jan 6, 2012, at 12:43 PM, Mary Kindall wrote:> I have two one dimensional list of elements and want to perform > cbind and > then write into a file. The number of entries are more than a > million in > both lists. R is taking a lot of time performing this operation. > > Is there any alternate way to perform cbind? > > x > y = table2[1:1000000,5] > > z = cbind(x,y) //hanging the machineYou should have been able to bypass the intermediate steps with just: z = cbind( table1[1:1000000,1] table2[1:1000000,5]) Whether you will have sufficient contiguous memory for that object at the moment or even after rm(x), rm(y) is in doubt, but had you not created the unneeded x and y, you _might_ have succeeded in your limited environment. (Real answer: Buy more RAM.) I speculate that you are on Windows and so refer your to the R-Win FAQ for further reading about memory limits.> > write.table(z,'out.txt)I do not know of a way to bypass the requirement of a named object to pass to write.table, but testing suggests that you could try: write( t(cbind( table1[1:1000000,1] table2[1:1000000,5])). "test.txt", 2) write() does not require a named object but is less inquisitive than write table and will give you a transposed matrix with 5 columns by default which will really mess up things, so you need to transpose and specify the number of columns. (And that may not save any space over creating a "z" object.) So there is another thread today to which master R programmer Bill Dunlap has offered this strategy (with minor modifications to your situation by me): ### f1 <- function (n, fileName) { unlink(fileName) system.time({ fileConn <- file(fileName, "wt") on.exit(close(fileConn)) for (i in seq_len(n)) cat( table1[i, 1], " ", table2[i, 5], "\n", file = fileConn) }) } f1(1000000, 'out.txt') #------------ -- David Winsemius, MD West Hartford, CT
Hello, I believe this function can handle a problem of that size, or bigger. It does NOT create the full matrix, just writes it to a file, a certain number of lines at a time. write.big.matrix <- function(x, y, outfile, nmax=1000){ if(file.exists(outfile)) unlink(outfile) testf <- file(outfile, "at") # or "wt" - "write text" on.exit(close(testf)) step <- nmax # how many at a time inx <- seq(1, length(x), by=step) # index into 'x' and 'y' mat <- matrix(0, nrow=step, ncol=2) # create a work matrix # do it 'nmax' rows per iteration for(i in inx){ mat <- cbind(x[i:(i+step-1)], y[i:(i+step-1)]) write.table(mat, file=testf, quote=FALSE, row.names=FALSE, col.names=FALSE) } # and now the remainder mat <- NULL mat <- cbind(x[(i+1):length(x)], y[(i+1):length(y)]) write.table(mat, file=testf, quote=FALSE, row.names=FALSE, col.names=FALSE) # return the output filename outfile } x <- 1:1e6 # a numeric vector y <- sample(letters, 1e6, replace=TRUE) # and a character vector length(x);length(y) # of the same length fl <- "test.txt" # output file system.time(write.big.matrix(x, y, outfile=fl)) On my system it takes (sample output) user system elapsed 1.59 0.04 1.65 and can handle different types of data. In the example, numeric and character. If you also need the matrix, try to use 'cbind' first, without writing to a file. If it's still slow, adapt the code above to keep inserting chunks in an output matrix. Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/cbind-alternate-tp4270188p4270444.html Sent from the R help mailing list archive at Nabble.com.
Sorry Mary, My function would write the remainder twice, I had only tested it with multiples of the chunk size. (And without looking at the lenghty output correctly.) Now checked: write.big.matrix <- function(x, y, outfile, nmax=1000){ if(file.exists(outfile)) unlink(outfile) testf <- file(outfile, "at") # or "wt" - "write text" on.exit(close(testf)) step <- nmax # how many at a time inx <- seq(1, length(x)-step, by=step) # index into 'x' and 'y' mat <- matrix(0, nrow=step, ncol=2) # create a work matrix # do it 'nmax' rows per iteration for(i in inx){ mat <- cbind(x[i:(i+step-1)], y[i:(i+step-1)]) write.table(mat, file=testf, quote=FALSE, row.names=FALSE, col.names=FALSE) } # and now the remainder if(i+step < length(x)){ mat <- NULL mat <- cbind(x[(i+step):length(x)], y[(i+step):length(y)]) write.table(mat, file=testf, quote=FALSE, row.names=FALSE, col.names=FALSE) } # return the output filename outfile } x <- 1:(1e6 + 1234) # a numeric vector y <- sample(letters, 1e6 + 1234, replace=TRUE) # and a character vector length(x);length(y) # of the same length fl <- "test.txt" # output file system.time(write.big.matrix(x, y, outfile=fl, nmax=100)) user system elapsed 3.04 0.06 3.09 system.time(write.big.matrix(x, y, outfile=fl)) user system elapsed 1.64 0.12 1.76 Rui Barradas -- View this message in context: http://r.789695.n4.nabble.com/cbind-alternate-tp4270188p4270687.html Sent from the R help mailing list archive at Nabble.com.
What is it you want to do with the data after you save it? Are you just going to read it back into R? If so, consider using save/load. On Fri, Jan 6, 2012 at 12:43 PM, Mary Kindall <mary.kindall at gmail.com> wrote:> I have two one dimensional list of elements and want to perform cbind and > then write into a file. The number of entries are more than a million in > both lists. R is taking a lot of time performing this operation. > > Is there any alternate way to perform cbind? > > x = table1[1:1000000,1] > y = table2[1:1000000,5] > > z = cbind(x,y) ? //hanging the machine > > write.table(z,'out.txt) > > > > -- > ------------- > Mary Kindall > Yorktown Heights, NY > USA > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.