thr3ads.net - R devel - [Rd] stats::reshape quadratic in number of input columns [Oct 2019]

If this information is useful, please help other people find it:
Share via:

Toby Hocking

2019-Oct-29 00:17 UTC

[Rd] stats::reshape quadratic in number of input columns

Hi R-core,

I have been performance testing R packages for wide-to-tall data reshaping
and for the most part I see they differ by constant factors.

However in one test, which involves converting into multiple output
columns, I see that stats::reshape is in fact quadratic in the number of
input columns. For example take the iris data, which has 4 input columns to
reshape, and the desired output has columns named
Species,Sepal,Petal,dimension (where part is either Length or Width). Of
course there is no performance issue with N=4 input columns in the original
iris data, but I made larger versions of this reshaping problem by making
copies of the input columns. The results
https://github.com/tdhock/nc-article#28-oct-2019 show that the quadratic
time complexity results in significant slowdowns after about N=10,000 input
columns to reshape. (e.g. several minutes for stats::reshape versus several
seconds for data.table::melt)

For a fix, I would suggest looking into how they implemented the same
operation in the data.table package, which in my test shows computation
times that seem to be linear.

Toby

	[[alternative HTML version deleted]]

Maybe Matching Threads

Search for more apparently analagous threads

R devel - Oct 2019 - stats::reshape quadratic in number of input columns

[Rd] stats::reshape quadratic in number of input columns

Maybe Matching Threads

Wisdom of the Ancients