I have two one dimensional list of elements and want to perform cbind and then write into a file. The number of entries are more than a million in both lists. R is taking a lot of time performing this operation. Is there any alternate way to perform cbind? x = table1[1:1000000,1] y = table2[1:1000000,5] z = cbind(x,y) //hanging the machine write.table(z,'out.txt) -- ------------- Mary Kindall Yorktown Heights, NY USA [[alternative HTML version deleted]]
You could break the data into chunks, so you cbind and save 50,000
observations at a time. That should be less taxing on your machine and
memory.
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On Behalf Of Mary Kindall
Sent: Friday, January 06, 2012 12:43 PM
To: r-help at r-project.org
Subject: [R] cbind alternate
I have two one dimensional list of elements and want to perform cbind
and then write into a file. The number of entries are more than a
million in both lists. R is taking a lot of time performing this
operation.
Is there any alternate way to perform cbind?
x = table1[1:1000000,1]
y = table2[1:1000000,5]
z = cbind(x,y) //hanging the machine
write.table(z,'out.txt)
--
-------------
Mary Kindall
Yorktown Heights, NY
USA
[[alternative HTML version deleted]]
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
***************************************************************
This message is for the named person's use only. It may\...{{dropped:11}}
On Jan 6, 2012, at 11:43 AM, Mary Kindall wrote:> I have two one dimensional list of elements and want to perform cbind and > then write into a file. The number of entries are more than a million in > both lists. R is taking a lot of time performing this operation. > > Is there any alternate way to perform cbind? > > x = table1[1:1000000,1] > y = table2[1:1000000,5] > > z = cbind(x,y) //hanging the machine > > write.table(z,'out.txt)The issue is not the use of cbind(), but that write.table() can be slow with data frames, where each column may be a different class (data type) and requires separate formatting for output. This is referenced in the Note section of ?write.table: write.table can be slow for data frames with large numbers (hundreds or more) of columns: this is inevitable as each column could be of a different class and so must be handled separately. If they are all of the same class, consider using a matrix instead. I suspect in this case, while you don't have a large number of columns, you do have a large number of rows, so that there is a tradeoff. If all of the columns in your source tables are of the same type (eg. all numeric), coerce 'z' to a matrix and then try using write.table(). z <- matrix(rnorm(1000000 * 6), ncol = 6)> str(z)num [1:1000000, 1:6] -0.713 0.79 -0.538 0.945 1.621 ...> system.time(write.table(z, file = "test.txt"))user system elapsed 12.664 0.292 13.029 The resultant file is about 118 Mb on my system. HTH, Marc Schwartz
On Jan 6, 2012, at 12:43 PM, Mary Kindall wrote:> I have two one dimensional list of elements and want to perform > cbind and > then write into a file. The number of entries are more than a > million in > both lists. R is taking a lot of time performing this operation. > > Is there any alternate way to perform cbind? > > x > y = table2[1:1000000,5] > > z = cbind(x,y) //hanging the machineYou should have been able to bypass the intermediate steps with just: z = cbind( table1[1:1000000,1] table2[1:1000000,5]) Whether you will have sufficient contiguous memory for that object at the moment or even after rm(x), rm(y) is in doubt, but had you not created the unneeded x and y, you _might_ have succeeded in your limited environment. (Real answer: Buy more RAM.) I speculate that you are on Windows and so refer your to the R-Win FAQ for further reading about memory limits.> > write.table(z,'out.txt)I do not know of a way to bypass the requirement of a named object to pass to write.table, but testing suggests that you could try: write( t(cbind( table1[1:1000000,1] table2[1:1000000,5])). "test.txt", 2) write() does not require a named object but is less inquisitive than write table and will give you a transposed matrix with 5 columns by default which will really mess up things, so you need to transpose and specify the number of columns. (And that may not save any space over creating a "z" object.) So there is another thread today to which master R programmer Bill Dunlap has offered this strategy (with minor modifications to your situation by me): ### f1 <- function (n, fileName) { unlink(fileName) system.time({ fileConn <- file(fileName, "wt") on.exit(close(fileConn)) for (i in seq_len(n)) cat( table1[i, 1], " ", table2[i, 5], "\n", file = fileConn) }) } f1(1000000, 'out.txt') #------------ -- David Winsemius, MD West Hartford, CT
Hello,
I believe this function can handle a problem of that size, or bigger.
It does NOT create the full matrix, just writes it to a file, a certain
number of lines at a time.
write.big.matrix <- function(x, y, outfile, nmax=1000){
if(file.exists(outfile)) unlink(outfile)
testf <- file(outfile, "at") # or "wt" - "write
text"
on.exit(close(testf))
step <- nmax # how many at a time
inx <- seq(1, length(x), by=step) # index into 'x' and
'y'
mat <- matrix(0, nrow=step, ncol=2) # create a work matrix
# do it 'nmax' rows per iteration
for(i in inx){
mat <- cbind(x[i:(i+step-1)], y[i:(i+step-1)])
write.table(mat, file=testf, quote=FALSE, row.names=FALSE,
col.names=FALSE)
}
# and now the remainder
mat <- NULL
mat <- cbind(x[(i+1):length(x)], y[(i+1):length(y)])
write.table(mat, file=testf, quote=FALSE, row.names=FALSE, col.names=FALSE)
# return the output filename
outfile
}
x <- 1:1e6 # a numeric vector
y <- sample(letters, 1e6, replace=TRUE) # and a character vector
length(x);length(y) # of the same length
fl <- "test.txt" # output file
system.time(write.big.matrix(x, y, outfile=fl))
On my system it takes (sample output)
user system elapsed
1.59 0.04 1.65
and can handle different types of data. In the example, numeric and
character.
If you also need the matrix, try to use 'cbind' first, without writing
to a
file.
If it's still slow, adapt the code above to keep inserting chunks in an
output matrix.
Rui Barradas
--
View this message in context:
http://r.789695.n4.nabble.com/cbind-alternate-tp4270188p4270444.html
Sent from the R help mailing list archive at Nabble.com.
Sorry Mary,
My function would write the remainder twice, I had only tested it
with multiples of the chunk size.
(And without looking at the lenghty output correctly.)
Now checked:
write.big.matrix <- function(x, y, outfile, nmax=1000){
if(file.exists(outfile)) unlink(outfile)
testf <- file(outfile, "at") # or "wt" -
"write text"
on.exit(close(testf))
step <- nmax # how many at a time
inx <- seq(1, length(x)-step, by=step) # index into 'x' and
'y'
mat <- matrix(0, nrow=step, ncol=2) # create a work matrix
# do it 'nmax' rows per iteration
for(i in inx){
mat <- cbind(x[i:(i+step-1)], y[i:(i+step-1)])
write.table(mat, file=testf, quote=FALSE, row.names=FALSE,
col.names=FALSE)
}
# and now the remainder
if(i+step < length(x)){
mat <- NULL
mat <- cbind(x[(i+step):length(x)], y[(i+step):length(y)])
write.table(mat, file=testf, quote=FALSE, row.names=FALSE,
col.names=FALSE)
}
# return the output filename
outfile
}
x <- 1:(1e6 + 1234) # a numeric vector
y <- sample(letters, 1e6 + 1234, replace=TRUE) # and a character vector
length(x);length(y) # of the same length
fl <- "test.txt" # output file
system.time(write.big.matrix(x, y, outfile=fl, nmax=100))
user system elapsed
3.04 0.06 3.09
system.time(write.big.matrix(x, y, outfile=fl))
user system elapsed
1.64 0.12 1.76
Rui Barradas
--
View this message in context:
http://r.789695.n4.nabble.com/cbind-alternate-tp4270188p4270687.html
Sent from the R help mailing list archive at Nabble.com.
What is it you want to do with the data after you save it? Are you just going to read it back into R? If so, consider using save/load. On Fri, Jan 6, 2012 at 12:43 PM, Mary Kindall <mary.kindall at gmail.com> wrote:> I have two one dimensional list of elements and want to perform cbind and > then write into a file. The number of entries are more than a million in > both lists. R is taking a lot of time performing this operation. > > Is there any alternate way to perform cbind? > > x = table1[1:1000000,1] > y = table2[1:1000000,5] > > z = cbind(x,y) ? //hanging the machine > > write.table(z,'out.txt) > > > > -- > ------------- > Mary Kindall > Yorktown Heights, NY > USA > > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it.