Toby Hocking
2019-Oct-29 00:17 UTC
[Rd] stats::reshape quadratic in number of input columns
Hi R-core, I have been performance testing R packages for wide-to-tall data reshaping and for the most part I see they differ by constant factors. However in one test, which involves converting into multiple output columns, I see that stats::reshape is in fact quadratic in the number of input columns. For example take the iris data, which has 4 input columns to reshape, and the desired output has columns named Species,Sepal,Petal,dimension (where part is either Length or Width). Of course there is no performance issue with N=4 input columns in the original iris data, but I made larger versions of this reshaping problem by making copies of the input columns. The results https://github.com/tdhock/nc-article#28-oct-2019 show that the quadratic time complexity results in significant slowdowns after about N=10,000 input columns to reshape. (e.g. several minutes for stats::reshape versus several seconds for data.table::melt) For a fix, I would suggest looking into how they implemented the same operation in the data.table package, which in my test shows computation times that seem to be linear. Toby [[alternative HTML version deleted]]
Maybe Matching Threads
- Bug: time complexity of substring is quadratic as string size and number of substrings increases
- Bug: time complexity of substring is quadratic as string size and number of substrings increases
- Bug: time complexity of substring is quadratic as string size and number of substrings increases
- read.csv quadratic time in number of columns
- write.csv performance improvements?