Hi
I rbind data frames in a loop in a cumulative way and the performance
detriorates very quickly.
My code looks like this:
for( k in 1:N)
{
filename <-
paste("/tmp/myData_",as.character(k),".txt",sep="")
myDataTmp <- read.table(filename,header=TRUE,sep=",")
if( k == 1) {
myData <- myDataTmp
}
else{
myData <- rbind(myData,myDataTmp)
}
}
Some more details:
- the size of the stored text files is about 100,000 rows and 50 columns
each
- for k=1: rbind takes 0.0004 seconds
- for k=2: rbind takes 13 seconds
- for k=3: rbind takes 30 seconds
- for k=4: rbind takes 36 seconds
etc
Any suggestions to improve speed?
Thanks
Zava
--------------------------------------------------------
This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
Read the data into a list and then:
do.call('rbind', myList)
at the end so you do it only once. You are having to reallocate
memory each iteration, so no wonder it is slow.
On 7/17/07, Aydemir, Zava (FID) <Zava.Aydemir at morganstanley.com>
wrote:> Hi
>
> I rbind data frames in a loop in a cumulative way and the performance
> detriorates very quickly.
>
> My code looks like this:
>
> for( k in 1:N)
> {
> filename <-
paste("/tmp/myData_",as.character(k),".txt",sep="")
> myDataTmp <- read.table(filename,header=TRUE,sep=",")
> if( k == 1) {
> myData <- myDataTmp
> }
> else{
> myData <- rbind(myData,myDataTmp)
> }
> }
>
> Some more details:
> - the size of the stored text files is about 100,000 rows and 50 columns
> each
> - for k=1: rbind takes 0.0004 seconds
> - for k=2: rbind takes 13 seconds
> - for k=3: rbind takes 30 seconds
> - for k=4: rbind takes 36 seconds
> etc
>
> Any suggestions to improve speed?
>
> Thanks
>
> Zava
> --------------------------------------------------------
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Cincinnati, OH
+1 513 646 9390
What is the problem you are trying to solve?
As Jim points out, building up a data frame by rbinding in a loop can be
a slow way to do things in R.
Here's an example of how you can easily read data frames into a list:
> # Create 3 files
> invisible(lapply(1:3, function(i)
write.csv(file=paste("tmp",i,".csv",sep=""),
data.frame(i=2*i+(1:2),c=letters[2*i+(1:2)]))))
> # Read the files into a list of data frames
> list.of.dfs <-
lapply(paste("tmp",1:3,".csv",sep=""), read.csv,
row.names=1)
> # rbind the data frames
> myData <- do.call("rbind", list.of.dfs)
> myData
i c
1 3 c
2 4 d
3 5 e
4 6 f
5 7 g
6 8 h
>
(and of course, these last two expressions can be composed into a single
expression if you want)
-- Tony Plate
Aydemir, Zava (FID) wrote:> Hi
>
> I rbind data frames in a loop in a cumulative way and the performance
> detriorates very quickly.
>
> My code looks like this:
>
> for( k in 1:N)
> {
> filename <-
paste("/tmp/myData_",as.character(k),".txt",sep="")
> myDataTmp <- read.table(filename,header=TRUE,sep=",")
> if( k == 1) {
> myData <- myDataTmp
> }
> else{
> myData <- rbind(myData,myDataTmp)
> }
> }
>
> Some more details:
> - the size of the stored text files is about 100,000 rows and 50 columns
> each
> - for k=1: rbind takes 0.0004 seconds
> - for k=2: rbind takes 13 seconds
> - for k=3: rbind takes 30 seconds
> - for k=4: rbind takes 36 seconds
> etc
>
> Any suggestions to improve speed?
>
> Thanks
>
> Zava
> --------------------------------------------------------
>
> This is not an offer (or solicitation of an offer) to buy/se...{{dropped}}
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>