thr3ads.net - R devel - [Rd] Bug: time complexity of substring is quadratic [Feb 2019]

If this information is useful, please help other people find it:
Share via:

Radford Neal

2019-Feb-23 17:37 UTC

[Rd] Bug: time complexity of substring is quadratic

> From: Tomas Kalibera <tomas.kalibera at gmail.com>
> 
> Thanks for the report, I am working on a patch that will address this.
> 
> I confirm there is a lot of potential for speedup. On my system,
> 
> 'N=200000; x <- substring(paste(rep("A", N),
collapse=""), 1:N, 1:N)'
> 
> spends 96% time in checking if the string is ascii and 3% in strlen(); 
> if we take advantage of the pre-computed value in the ASCII bit, the 
> speed up is about 40x.

The latest version of pqR (at pqR-project.org) has changes that
considerably speed up both this and other string operations.

Here's a test (both compiled with gcc 8.2.0 with -O3 on a Skylake
processor).

R-3.5.2:
> N=200000; system.time(for (i in 1:100)
r<-paste(rep("A",N),collapse=""))   user  system elapsed 
  1.548   0.023   1.572 > system.time(for (i in 1:10) x<-substring(r,1:N,1:N))   user  system elapsed 
  4.462   0.016   4.478 

pqR-2019-02-19:
> N=200000; system.time(for (i in 1:100)
r<-paste(rep("A",N),collapse=""))   user  system elapsed 
  0.318   0.071   0.388 > system.time(for (i in 1:10) x<-substring(r,1:N,1:N))   user  system elapsed 
  0.041   0.000   0.041 

Some of this may be due to pqR's faster garbage collector - R Core
implementatons have a particular GC problem with strings, as explained at
https://radfordneal.wordpress.com/2018/11/29/faster-garbage-collection-in-pqr/

But there are also some specific improvements to string operations that
you might want to have a look at.

    Radford Neal

Possibly Parallel Threads

Search for more maybe matching threads

R devel - Feb 2019 - Bug: time complexity of substring is quadratic

[Rd] Bug: time complexity of substring is quadratic

Possibly Parallel Threads

Wisdom of the Ancients