Marcus Jellinghaus
2002-Oct-06 13:49 UTC
[R] Why are big data.frames slow? What can I do to get it faster?
Hello,
I?m quite new to this list.
I have a high frequency-dataset with more than 500.000 records.
I want to edit a data.frame "Test". My small programm runs fine with a
small
part of the dataset (just 100 records), but it is very slow with a huge
dataset. Of course it get?s slower with more records, but when I change just
the size of the frame and keep the number of edited records fixed, I see
that it is also getting slower.
Here is my program:
print(dim(test)[1])
Sys.time()
for(i in 1:100) {
test[i,6] = paste(test[i,2],"-",test[i,3], sep = "")
}
Sys.time()
I connect 2 currency symbols to a currency pair.
I always calculate only for the first 100 lines.
WHen I load just 100 lines in the data.frame "test", it takes 1
second.
When I load 1000 lines, editing 100 lines takes 2 seconds,
10,000 lines loaded and 100 lines editing takes 5 seconds,
100,000 lines loaded and editing 100 lines takes 31 seconds,
500,000 lines loaded and editing 100 lines takes 11 minutes(!!!).
My computer has 1 GB Ram, so that shouldn?t be the reason.
Of course, I could work with many small data.frames instead of one big, but
the program above is just the very first step and so I don?t want to split.
Is there a way to edit big data.frames without waiting for a long time?
Thank?s a lot for help,
Marcus
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Uwe Ligges
2002-Oct-06 17:57 UTC
[R] Why are big data.frames slow? What can I do to get it faster?
Marcus Jellinghaus wrote:> > Hello, > > I?m quite new to this list. > I have a high frequency-dataset with more than 500.000 records. > I want to edit a data.frame "Test". My small programm runs fine with a small > part of the dataset (just 100 records), but it is very slow with a huge > dataset. Of course it get?s slower with more records, but when I change just > the size of the frame and keep the number of edited records fixed, I see > that it is also getting slower. > > Here is my program: > > print(dim(test)[1]) > Sys.time() > for(i in 1:100) { > test[i,6] = paste(test[i,2],"-",test[i,3], sep = "") > } > Sys.time() > > I connect 2 currency symbols to a currency pair. > I always calculate only for the first 100 lines. > WHen I load just 100 lines in the data.frame "test", it takes 1 second. > When I load 1000 lines, editing 100 lines takes 2 seconds, > 10,000 lines loaded and 100 lines editing takes 5 seconds, > 100,000 lines loaded and editing 100 lines takes 31 seconds, > 500,000 lines loaded and editing 100 lines takes 11 minutes(!!!). > > My computer has 1 GB Ram, so that shouldn?t be the reason. > > Of course, I could work with many small data.frames instead of one big, but > the program above is just the very first step and so I don?t want to split. > > Is there a way to edit big data.frames without waiting for a long time?Well, the point is, I guess, to address elements in a large data.frame, which reasonably takes much more time than in a small one. Maybe it's an idea to use vectorized operations instead of the loop, which is preferable, if your computation is easy vectorizable without a big penalty of memory consumption: test[1:100, 6] <- paste(test[1:100, 2], "-", test[1:100, 3], sep = "") or test[ , 6] <- paste(test[ , 2], "-", test[ , 3], sep = "") for the whole data.frame. Uwe Ligges -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
Thomas Lumley
2002-Oct-06 21:21 UTC
[R] Why are big data.frames slow? What can I do to get it faster?
On Sun, 6 Oct 2002, Marcus Jellinghaus wrote:> Hello, > > I´m quite new to this list. > I have a high frequency-dataset with more than 500.000 records. > I want to edit a data.frame "Test". My small programm runs fine with a small > part of the dataset (just 100 records), but it is very slow with a huge > dataset. Of course it get´s slower with more records, but when I change just > the size of the frame and keep the number of edited records fixed, I see > that it is also getting slower. > > Here is my program: > > print(dim(test)[1]) > Sys.time() > for(i in 1:100) { > test[i,6] = paste(test[i,2],"-",test[i,3], sep = "") > } > Sys.time()1.6.0 has faster dataframe indexing. Also, there's no need to do this one line at a time i<-1:100 test[i,6]<-paste(test[i,2],test[i,3],sep="-") should be quite a bit faster. -thomas> I connect 2 currency symbols to a currency pair. > I always calculate only for the first 100 lines. > WHen I load just 100 lines in the data.frame "test", it takes 1 second. > When I load 1000 lines, editing 100 lines takes 2 seconds, > 10,000 lines loaded and 100 lines editing takes 5 seconds, > 100,000 lines loaded and editing 100 lines takes 31 seconds, > 500,000 lines loaded and editing 100 lines takes 11 minutes(!!!). > > My computer has 1 GB Ram, so that shouldn´t be the reason. > > Of course, I could work with many small data.frames instead of one big, but > the program above is just the very first step and so I don´t want to split. > > Is there a way to edit big data.frames without waiting for a long time? > > > Thank´s a lot for help, > > > Marcus > > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- > r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html > Send "info", "help", or "[un]subscribe" > (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch > _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._ >Thomas Lumley Asst. Professor, Biostatistics tlumley at u.washington.edu University of Washington, Seattle ^^^^^^^^^^^^^^^^^^^^^^^^ - NOTE NEW EMAIL ADDRESS -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._