Andreas Kersting
2021-Oct-07 07:33 UTC
[Rd] GC: improving the marking performance for STRSXPs
Hi all, in GC (in src/main/memory.c), FORWARD_CHILDREN() (called by PROCESS_NODES()) treats STRSXPs just like VECSXPs, i.e. it calls FORWARD_NODE() for all its children. I claim that this is unnecessarily inefficient since the children of a STRSXP can legitimately only be (atomic) CHARSXPs and could hence be marked directly in the call of FORWARD_CHILDREN() on the STRSXP. Attached patch (?atomic_CHARSXP.diff) implements this and gives the following performance improvements on my system compared to R devel (revision 81008): Elapsed time for two full gc in a session after x <- as.character(runif(5e7))[] 19sec -> 15sec. This is the best-case scenario for the patch: very many unique/unmarked CHARSXP in the STRSXP. For already marked CHARSXP there is no performance gain since FORWARD_NODE() is a no-op for them. The relative performance gain is even bigger if iterating through the STRSXP produces many cache misses, as e.g. after x <- as.character(runif(5e7))[] x <- sample(x, length(x)) Elapsed time for two full gc here: 83sec -> 52sec. This is because we have less cache misses per CHARSXP. This patch additionally also assumes that the ATTRIBs of a CHARSXP are not to be traced because they are just used for maintaining the CHARSXP hash chains. The second attached patch (?atomic_CHARSXP_safe_unlikely.diff) checks both assumptions and calls gc_error() if they are violated and is still noticeably faster than R devel: 19sec -> 17sec and 83sec -> 54sec, respectively. Attached gc_test.R is the script I used to get the previously mentioned and more gc timings. Do you think that this is a reasonable change? It does make the code more complex and I am not sure if there might be situations in which the assumptions are violated, even though ?SET_STRING_ELT() and ?installAttrib() do enforce them. Best regards, Andreas -------------- next part -------------- A non-text attachment was scrubbed... Name: atomic_CHARSXP.diff Type: text/x-patch Size: 1948 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20211007/f018a189/attachment.bin> -------------- next part -------------- A non-text attachment was scrubbed... Name: atomic_CHARSXP_safe_unlikely.diff Type: text/x-patch Size: 2427 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-devel/attachments/20211007/f018a189/attachment-0001.bin>
iuke-tier@ey m@iii@g oii uiow@@edu
2021-Oct-18 23:45 UTC
[Rd] [External] GC: improving the marking performance for STRSXPs
Thanks. I have committed a modified version, also incorporating the handling of R_StringHash from your other post, in r81073. I prefer to be more conservative in the GC. for example not assume without checking that STRSXP elements are CHARSXP. This does add some overhead, but the change is still beneficial. I don't think we would want to add the complexity of threading at this point, though it might be worth considering at a later time. There are a few other possible modifications that I'll explore that might provide comparable improvements to the ones seen with your patch without adding the complexity of threads. Best, luke On Thu, 7 Oct 2021, Andreas Kersting wrote:> Hi all, > > in GC (in src/main/memory.c), FORWARD_CHILDREN() (called by PROCESS_NODES()) treats STRSXPs just like VECSXPs, i.e. it calls FORWARD_NODE() for all its children. I claim that this is unnecessarily inefficient since the children of a STRSXP can legitimately only be (atomic) CHARSXPs and could hence be marked directly in the call of FORWARD_CHILDREN() on the STRSXP. > > Attached patch (?atomic_CHARSXP.diff) implements this and gives the following performance improvements on my system compared to R devel (revision 81008): > > Elapsed time for two full gc in a session after > > x <- as.character(runif(5e7))[] > > 19sec -> 15sec. > > This is the best-case scenario for the patch: very many unique/unmarked CHARSXP in the STRSXP. For already marked CHARSXP there is no performance gain since FORWARD_NODE() is a no-op for them. > > The relative performance gain is even bigger if iterating through the STRSXP produces many cache misses, as e.g. after > > x <- as.character(runif(5e7))[] > x <- sample(x, length(x)) > > Elapsed time for two full gc here: 83sec -> 52sec. This is because we have less cache misses per CHARSXP. > > This patch additionally also assumes that the ATTRIBs of a CHARSXP are not to be traced because they are just used for maintaining the CHARSXP hash chains. > > The second attached patch (?atomic_CHARSXP_safe_unlikely.diff) checks both assumptions and calls gc_error() if they are violated and is still noticeably faster than R devel: 19sec -> 17sec and 83sec -> 54sec, respectively. > > Attached gc_test.R is the script I used to get the previously mentioned and more gc timings. > > Do you think that this is a reasonable change? It does make the code more complex and I am not sure if there might be situations in which the assumptions are violated, even though ?SET_STRING_ELT() and ?installAttrib() do enforce them. > > Best regards, > Andreas-- Luke Tierney Ralph E. Wareham Professor of Mathematical Sciences University of Iowa Phone: 319-335-3386 Department of Statistics and Fax: 319-335-3017 Actuarial Science 241 Schaeffer Hall email: luke-tierney at uiowa.edu Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu