Artur Pilipenko via llvm-dev
2020-Sep-18 23:51 UTC
[llvm-dev] GC-parseable element atomic memcpy/memmove
TLDR: a proposal to add GC-parseable lowering to element atomic memcpy/memmove instrinsics controlled by a new "requires-statepoint” call attribute. Currently llvm.{memcpy|memmove}.element.unordered.atomic calls are considered as GC leaf functions (like most other intrinsics). As a result GC cannot occur while copy operation is in progress. This might have negative effect on GC latencies when large amounts of data are copied. To avoid this problem copying large amounts of data can be done in chunks with GC safepoints in between. We'd like to be able to represent such copy using existing instrinsics [1]. For that I'd like to propose a new attribute for llvm.{memcpy|memmove}.element.unordered.atomic calls "requires-statepoint". This attribute on a call will result in a different lowering, which makes it possible to have a GC safepoint during the copy operation. There are three parts to the new lowering: 1) The calls with the new attribute will be wrapped into a statepoint by RewriteStatepointsForGC (RS4GC). This way the stack at the calls will be GC parceable. 2) Currently these intrinsics are lowered to GC leaf calls to the symbols __llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>. The calls with the new attribute will be lowered to calls to different symbols, let's say __llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>. This way the runtime can provide copy implementations with safepoints. 3) Currently memcpy/memmove calls take derived pointers as arguments. If we copy with safepoints we might need to relocate the underlying source/destination objects on a safepoint. In order to do this we need to know the base pointers as well. How do we make the base pointers available in the copy routine? I suggest we add them explicitly as arguments during lowering. For example: __llvm_memcpy_element_unordered_atomic_safepoint_1( dest_base, dest_derived, src_base, src_derived, length) It will be up to RS4GC to do the new lowering and prepare the arguments. RS4GC knows how to compute base pointers for a given derived pointer. It also already does lowering for deoptimize intrinsics by replacing an intrinsic call with a symbol call. So there is a precedent here. Other alternatives: - Change llvm.{memcpy|memmove}.element.unordered.atomic API to accept base pointers + offsets instead of derived pointers. This will require autoupgrade of old representation. Changing API of a generic intrinsic to facilitate GC-specific lowering doesn't look like the best idea. This will not work if we want to do the same for non-atomic intrinsics. - Teach GC infrastructure to record base pointers for all derived pointer arguments. This looks like an overkill for single use case. Here is the proposed implementation in a single patch: https://reviews.llvm.org/D87954 If there are no objections I will split it into individual reviews and add langref changes. Thoughts? Artur [1] An alternative approach would be to make the frontend generate a chunked copy loop with a safepoint inside. The downsides are: - It's harder for the optimizer to see that this loop is just a copy of a range of bytes. - It forces one particular lowering with the chunked loop inlined in compiled code. We can't outline the copy loop into the copy routine. With the intrinsic representation of a chunked copy we can choose different lowering strategies if we want. - In our system we have to outline the copy loop into the copy routine due to interactions with deoptimization. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200918/91173390/attachment.html>
Artur Pilipenko via llvm-dev
2020-Sep-25 02:28 UTC
[llvm-dev] GC-parseable element atomic memcpy/memmove
Ping? Artur On Sep 18, 2020, at 4:51 PM, Artur Pilipenko <apilipenko at azul.com<mailto:apilipenko at azul.com>> wrote: TLDR: a proposal to add GC-parseable lowering to element atomic memcpy/memmove instrinsics controlled by a new "requires-statepoint” call attribute. Currently llvm.{memcpy|memmove}.element.unordered.atomic calls are considered as GC leaf functions (like most other intrinsics). As a result GC cannot occur while copy operation is in progress. This might have negative effect on GC latencies when large amounts of data are copied. To avoid this problem copying large amounts of data can be done in chunks with GC safepoints in between. We'd like to be able to represent such copy using existing instrinsics [1]. For that I'd like to propose a new attribute for llvm.{memcpy|memmove}.element.unordered.atomic calls "requires-statepoint". This attribute on a call will result in a different lowering, which makes it possible to have a GC safepoint during the copy operation. There are three parts to the new lowering: 1) The calls with the new attribute will be wrapped into a statepoint by RewriteStatepointsForGC (RS4GC). This way the stack at the calls will be GC parceable. 2) Currently these intrinsics are lowered to GC leaf calls to the symbols __llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>. The calls with the new attribute will be lowered to calls to different symbols, let's say __llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>. This way the runtime can provide copy implementations with safepoints. 3) Currently memcpy/memmove calls take derived pointers as arguments. If we copy with safepoints we might need to relocate the underlying source/destination objects on a safepoint. In order to do this we need to know the base pointers as well. How do we make the base pointers available in the copy routine? I suggest we add them explicitly as arguments during lowering. For example: __llvm_memcpy_element_unordered_atomic_safepoint_1( dest_base, dest_derived, src_base, src_derived, length) It will be up to RS4GC to do the new lowering and prepare the arguments. RS4GC knows how to compute base pointers for a given derived pointer. It also already does lowering for deoptimize intrinsics by replacing an intrinsic call with a symbol call. So there is a precedent here. Other alternatives: - Change llvm.{memcpy|memmove}.element.unordered.atomic API to accept base pointers + offsets instead of derived pointers. This will require autoupgrade of old representation. Changing API of a generic intrinsic to facilitate GC-specific lowering doesn't look like the best idea. This will not work if we want to do the same for non-atomic intrinsics. - Teach GC infrastructure to record base pointers for all derived pointer arguments. This looks like an overkill for single use case. Here is the proposed implementation in a single patch: https://reviews.llvm.org/D87954 If there are no objections I will split it into individual reviews and add langref changes. Thoughts? Artur [1] An alternative approach would be to make the frontend generate a chunked copy loop with a safepoint inside. The downsides are: - It's harder for the optimizer to see that this loop is just a copy of a range of bytes. - It forces one particular lowering with the chunked loop inlined in compiled code. We can't outline the copy loop into the copy routine. With the intrinsic representation of a chunked copy we can choose different lowering strategies if we want. - In our system we have to outline the copy loop into the copy routine due to interactions with deoptimization. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200925/7c6f6d22/attachment.html>
Philip Reames via llvm-dev
2020-Sep-28 17:56 UTC
[llvm-dev] GC-parseable element atomic memcpy/memmove
In general, I am supportive of this direction. It seems like an entirely reasonable solution. I do have some comments below, but they're mostly of the "how do we generalize this?" variety. First, let's touch on the attribute. My first concern is naming; I think the use of "statepoint" here is problematic as this doesn't relate to lowering strategy needed (e.g. statepoints), but the conceptual support (e.g. a safepoint). This could be resolved by simply tweaking to require-safepoint. But that brings us to a broader point. We've chosen to build in the fact intrinsics don't require safepoints. If all we want is for some intrinsics *to* require safepoints, why isn't this simply a tweak to the existing code? callsGCLeafFunction already has a small list of intrinsics which can have safepoints. I think you can completely remove the need for this attribute by a) adding the atomic memcpy variants to the exclude list in callsGCLeafFunction, and b) using the existing "gc-leaf-function" on most calls the frontend generates. Second, let's discuss the signature for the runtime function. I think you should use a signature for the runtime call which takes base pointers and offsets, not base pointers and derived pointers. Why? Because passing derived pointers in registers for arguments presumes that the runtime knows how to map a record in the stackmap to where a callee might have shuffled the argument to. Some runtimes may support this, others may not. Given using the offset scheme is just as simple to implement, being considerate and minimizing the runtime support required seems worthwhile. On x86, the cost of a subtract (to produce the offset in the worst case), and an LEA (to produce the derived pointer again inside the runtime routine) is pretty minimal. Particular since the former is likely to be optimized away and the later folded into the addressing mode. Finally, it's also worth noting that some (but not all) GCs can convert from an interior derived pointer to the base of the containing object. With the memcpy family we know that either the pointers are all interior derived, or the length must be zero. This is not true for all GCs and thus we don't want to rely on it. Philip On 9/18/20 4:51 PM, Artur Pilipenko via llvm-dev wrote:> TLDR: a proposal to add GC-parseable lowering to element atomic > memcpy/memmove instrinsics controlled by a new "requires-statepoint” > call attribute. > > Currently llvm.{memcpy|memmove}.element.unordered.atomic calls are > considered as GC leaf functions (like most other intrinsics). As a > result GC cannot occur while copy operation is in progress. This might > have negative effect on GC latencies when large amounts of data are > copied. To avoid this problem copying large amounts of data can be > done in chunks with GC safepoints in between. We'd like to be able to > represent such copy using existing instrinsics [1]. > > For that I'd like to propose a new attribute for > llvm.{memcpy|memmove}.element.unordered.atomic calls > "requires-statepoint". This attribute on a call will result in a > different lowering, which makes it possible to have a GC safepoint > during the copy operation. > > There are three parts to the new lowering: > > 1) The calls with the new attribute will be wrapped into a statepoint > by RewriteStatepointsForGC (RS4GC). This way the stack at the calls > will be GC parceable. > > 2) Currently these intrinsics are lowered to GC leaf calls to the symbols > __llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>. > The calls with the new attribute will be lowered to calls to different > symbols, let's say > __llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>. > This way the runtime can provide copy implementations with safepoints. > > 3) Currently memcpy/memmove calls take derived pointers as arguments. > If we copy with safepoints we might need to relocate the underlying > source/destination objects on a safepoint. In order to do this we need > to know the base pointers as well. How do we make the base pointers > available in the copy routine? I suggest we add them explicitly as > arguments during lowering. > > For example: > __llvm_memcpy_element_unordered_atomic_safepoint_1( > dest_base, dest_derived, src_base, src_derived, length) > > It will be up to RS4GC to do the new lowering and prepare the arguments. > RS4GC knows how to compute base pointers for a given derived pointer. > It also already does lowering for deoptimize intrinsics by replacing > an intrinsic call with a symbol call. So there is a precedent here. > > Other alternatives: > - Change llvm.{memcpy|memmove}.element.unordered.atomic API to accept > base pointers + offsets instead of derived pointers. This will > require autoupgrade of old representation. Changing API of a generic > intrinsic to facilitate GC-specific lowering doesn't look like the > best idea. This will not work if we want to do the same for non-atomic > intrinsics. > - Teach GC infrastructure to record base pointers for all derived > pointer arguments. This looks like an overkill for single use case. > > Here is the proposed implementation in a single patch: > https://reviews.llvm.org/D87954 > If there are no objections I will split it into individual reviews and > add langref changes. > > Thoughts? > > Artur > > [1] An alternative approach would be to make the frontend generate a > chunked copy loop with a safepoint inside. The downsides are: > - It's harder for the optimizer to see that this loop is just a copy > of a range of bytes. > - It forces one particular lowering with the chunked loop inlined in > compiled code. We can't outline the copy loop into the copy routine. > With the intrinsic representation of a chunked copy we can choose > different lowering strategies if we want. > - In our system we have to outline the copy loop into the copy routine > due to interactions with deoptimization. > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200928/2d689ed2/attachment.html>
Artur Pilipenko via llvm-dev
2020-Sep-30 04:11 UTC
[llvm-dev] GC-parseable element atomic memcpy/memmove
Thanks for the feedback. I think both of the suggestions are very reasonable. I’ll incorporate them. Given there were no objections for two weeks, I’m going to go ahead with posting individual patches for review. One small question inline: On Sep 28, 2020, at 10:56 AM, Philip Reames <listmail at philipreames.com<mailto:listmail at philipreames.com>> wrote: In general, I am supportive of this direction. It seems like an entirely reasonable solution. I do have some comments below, but they're mostly of the "how do we generalize this?" variety. First, let's touch on the attribute. My first concern is naming; I think the use of "statepoint" here is problematic as this doesn't relate to lowering strategy needed (e.g. statepoints), but the conceptual support (e.g. a safepoint). This could be resolved by simply tweaking to require-safepoint. But that brings us to a broader point. We've chosen to build in the fact intrinsics don't require safepoints. If all we want is for some intrinsics *to* require safepoints, why isn't this simply a tweak to the existing code? callsGCLeafFunction already has a small list of intrinsics which can have safepoints. I think you can completely remove the need for this attribute by a) adding the atomic memcpy variants to the exclude list in callsGCLeafFunction, and b) using the existing "gc-leaf-function" on most calls the frontend generates. Second, let's discuss the signature for the runtime function. I think you should use a signature for the runtime call which takes base pointers and offsets, not base pointers and derived pointers. Why? Because passing derived pointers in registers for arguments presumes that the runtime knows how to map a record in the stackmap to where a callee might have shuffled the argument to. Some runtimes may support this, others may not. Given using the offset scheme is just as simple to implement, being considerate and minimizing the runtime support required seems worthwhile. On x86, the cost of a subtract (to produce the offset in the worst case), and an LEA (to produce the derived pointer again inside the runtime routine) is pretty minimal. Particular since the former is likely to be optimized away and the later folded into the addressing mode. Finally, it's also worth noting that some (but not all) GCs can convert from an interior derived pointer to the base of the containing object. With the memcpy family we know that either the pointers are all interior derived, or the length must be zero. This is not true for all GCs and thus we don't want to rely on it. Do you think it makes sense to control this aspect of lowering (derived pointers vs base+offset in memcpy args) using GCStrategy? Artur Philip On 9/18/20 4:51 PM, Artur Pilipenko via llvm-dev wrote: TLDR: a proposal to add GC-parseable lowering to element atomic memcpy/memmove instrinsics controlled by a new "requires-statepoint” call attribute. Currently llvm.{memcpy|memmove}.element.unordered.atomic calls are considered as GC leaf functions (like most other intrinsics). As a result GC cannot occur while copy operation is in progress. This might have negative effect on GC latencies when large amounts of data are copied. To avoid this problem copying large amounts of data can be done in chunks with GC safepoints in between. We'd like to be able to represent such copy using existing instrinsics [1]. For that I'd like to propose a new attribute for llvm.{memcpy|memmove}.element.unordered.atomic calls "requires-statepoint". This attribute on a call will result in a different lowering, which makes it possible to have a GC safepoint during the copy operation. There are three parts to the new lowering: 1) The calls with the new attribute will be wrapped into a statepoint by RewriteStatepointsForGC (RS4GC). This way the stack at the calls will be GC parceable. 2) Currently these intrinsics are lowered to GC leaf calls to the symbols __llvm_{memcpy|memmove}_element_unordered_atomic_<element_size>. The calls with the new attribute will be lowered to calls to different symbols, let's say __llvm_{memcpy|memmove}_element_unordered_atomic_safepoint_<element_size>. This way the runtime can provide copy implementations with safepoints. 3) Currently memcpy/memmove calls take derived pointers as arguments. If we copy with safepoints we might need to relocate the underlying source/destination objects on a safepoint. In order to do this we need to know the base pointers as well. How do we make the base pointers available in the copy routine? I suggest we add them explicitly as arguments during lowering. For example: __llvm_memcpy_element_unordered_atomic_safepoint_1( dest_base, dest_derived, src_base, src_derived, length) It will be up to RS4GC to do the new lowering and prepare the arguments. RS4GC knows how to compute base pointers for a given derived pointer. It also already does lowering for deoptimize intrinsics by replacing an intrinsic call with a symbol call. So there is a precedent here. Other alternatives: - Change llvm.{memcpy|memmove}.element.unordered.atomic API to accept base pointers + offsets instead of derived pointers. This will require autoupgrade of old representation. Changing API of a generic intrinsic to facilitate GC-specific lowering doesn't look like the best idea. This will not work if we want to do the same for non-atomic intrinsics. - Teach GC infrastructure to record base pointers for all derived pointer arguments. This looks like an overkill for single use case. Here is the proposed implementation in a single patch: https://reviews.llvm.org/D87954 If there are no objections I will split it into individual reviews and add langref changes. Thoughts? Artur [1] An alternative approach would be to make the frontend generate a chunked copy loop with a safepoint inside. The downsides are: - It's harder for the optimizer to see that this loop is just a copy of a range of bytes. - It forces one particular lowering with the chunked loop inlined in compiled code. We can't outline the copy loop into the copy routine. With the intrinsic representation of a chunked copy we can choose different lowering strategies if we want. - In our system we have to outline the copy loop into the copy routine due to interactions with deoptimization. _______________________________________________ LLVM Developers mailing list llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200930/fbeadae9/attachment.html>