Peter Jakubek via llvm-dev
2016-Apr-10 19:37 UTC
[llvm-dev] compler-rt, __aeabi_memcpy () possibly broken (ARM)
Hello, I recognized that compiler-rt's the implementation of __aeabi_memcpy simply branches to memcpy. The implementation of memcpy is not provided. So an externally provided memcpy () has to be used. (also applies to memmove, memset, memclr) On ARM I have seen implementations of memcpy () using floating-point registers (if compiled with NEON support). The is perfectly legal, as memcpy () only needs to comply with the Procedure Call Standard. According to this d0-d7, d16-d31 are scratch registers. The situation is slightly different for __aeabi_memcpy (). The ABI spec explicitly states: "In general, implementations of these functions are allowed to corrupt only the integer core registers permitted to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)." newlib addresses this by explicitly providing a separate implementation for __aeabi_memcpy () and memcpy (). But things can get messed up if compiler-rt's __aeabi_memcpy () is actually used. (with any memcpy () that clobbers flating-point registers.) The implementation of __aeabi_memcpy () in ARM's compiler 5 tool-chain also uses floating-point registers - but preserves the contents. So this one is ABI compliant. My conclusion is that the current implementation of compiler-rt can potentially introduce difficult to track down problems. Unless of course LLVM always can handle corrupted floating-point registers for all calls to __aeabi_memcpy (). Either the aeabi variants of memcpy, memmove, memset, memclr should be fixed or the stubs should removed from compiler-rt. Or am I getting it all wrong? Peter
Renato Golin via llvm-dev
2016-Apr-11 09:43 UTC
[llvm-dev] compler-rt, __aeabi_memcpy () possibly broken (ARM)
On 10 April 2016 at 20:37, Peter Jakubek via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I recognized that compiler-rt's the implementation of __aeabi_memcpy simply > branches to memcpy. > The implementation of memcpy is not provided. So an externally provided > memcpy () has to be used. > (also applies to memmove, memset, memclr)Hi Peter, In a nutshell, Compiler-RT may assume there is a C library underneath. This is the same for LibGCC (https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html): "GCC will also generate calls to C library routines, such as memcpy and memset, in some cases." This also works on free-standing environments (ex. the Linux kernel) because those environments assume the compiler library will do so, and thus implement "memcpy", "memset", etc.> On ARM I have seen implementations of memcpy () using floating-point > registers (if compiled with NEON support). The is perfectly > legal, as memcpy () only needs to comply with the Procedure Call Standard. > According to this d0-d7, d16-d31 are scratch registers.RT *could* have some optimised versions of those functions, but as you can imagine, this is not a trivial pursuit. One would need to take into account all the variations, alignment, ISA support and micro-architecture differences, and that's a very large project, even if we just consider ARM targets. Given that most C libraries have had that consideration already, and have done more tests (conformance and performance) than we have, it's safe to assume that whatever the C libraries did is probably better (average case) than any "smart" thing we can conjure in a weekend.> The situation is slightly different for __aeabi_memcpy (). The ABI spec > explicitly states: "In general, implementations of these > functions are allowed to corrupt only the integer core registers permitted > to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)."This is standard AAPCS, I trully hope neither glibc, newlib or musl's implementations will corrupt anything more.> newlib addresses this by explicitly providing a separate implementation for > __aeabi_memcpy () and memcpy (). > But things can get messed up if compiler-rt's __aeabi_memcpy () is actually > used.Indeed, this is the real problem (not the FP clobbering, the two libc's versions). This will be a linker nightmare, and could break the programmers assumption by using a specific C library. Furthermore, EABI's implementation returns void, while the C standard version returns the pointer, so they're not completely replaceable for every case. So, it's safe to use memcpy for all calls to __eabi_memcpy (modulo AAPCS bugs, performance), but the other way around is not.> The implementation of __aeabi_memcpy () in ARM's compiler 5 tool-chain also > uses floating-point registers - but preserves the contents. > So this one is ABI compliant.Any function that is AAPCS compliant *must* preserve all but the AAPCS registers. If one doesn't, it's not Compiler-RT's fault to assume so.> My conclusion is that the current implementation of compiler-rt can > potentially introduce difficult to track down problems. > Unless of course LLVM always can handle corrupted floating-point registers > for all calls to __aeabi_memcpy (). > > Either the aeabi variants of memcpy, memmove, memset, memclr should be fixed > or the stubs should removed from compiler-rt.I agree. But it's not that simple. Some environments (ex. Linux kernel, Android, FreeBSD?, OSX?) depend on this behaviour, and as much as I agree with you that this is a broken assumption, we can't just remove it from RT. To make matters more complicated, different environments will update RT at different times, and any change we do will have impact for years to come, and will be hard to fix in the best way for everyone, quickly enough. I believe one quick way to "fix" the multiple-versions is to mark RT's version as weak (or equivalent), so that the C library's (or free-standing's) version always gets picked first. But it may be an over simplification from my part... I may also be underestimating the register clobbering problem, so if you could give me a concrete example where it's ok for the library to corrupt non-AAPCS registers, we could further discuss it. The only case I remember in GCC, was that it was spilling to VFP registers (before stack) when they existed in the platform, but this was a bad idea and was reverted promptly, because it was breaking stack unwinding, context-switching, etc. cheers, --renato
Tim Northover via llvm-dev
2016-Apr-11 14:12 UTC
[llvm-dev] compler-rt, __aeabi_memcpy () possibly broken (ARM)
On 11 April 2016 at 02:43, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote:> This is standard AAPCS, I trully hope neither glibc, newlib or musl's > implementations will corrupt anything more.Normal AAPCS functions are allowed to corrupt d0-d7 and d16-d31, but the RTABI seems pretty explicit that these functions aren't. I think we've got a real problem here. Cheers. Tim.
Peter Jakubek via llvm-dev
2016-Apr-11 19:01 UTC
[llvm-dev] compler-rt, __aeabi_memcpy () possibly broken (ARM)
Renato, thanks for your reply. However, I'm not sure whether I agree to your conclusions. Let me try to show why:>> The situation is slightly different for __aeabi_memcpy (). The ABI spec >> explicitly states: "In general, implementations of these >> functions are allowed to corrupt only the integer core registers permitted >> to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)." > > This is standard AAPCS, I trully hope neither glibc, newlib or musl's > implementations will corrupt anything more.No, this is *not* standard AAPCS. "integer core registers" makes the difference here! It is rather the base standard AAPCS in ARM's nomenclature. A normal AAPCS compliant C-function (like, let's say memcpy) can also corrupt the floating-point registers D0-D7, D16-D31. (see: Procedure Call Standard for the ARM Architecture, section 5.1.2.1) And that's exactly what some implementations of memcpy do. These scratch floating-point registers just come in handy to transfer 32 or 64 bytes at once! __aeaby_memcpy () however does not have this freedom. It is bound by the "Run-time ABI for the ARM Architecture". And this restricts it to corrupt core integer registers only. It must preserve the contents of all floating-point registers. Therefor it can not simply branch to an externally provided memcpy, without knowing how that is implemented. That means it is safe to use __eabi_memcpy () for calls to memcpy (), but not the other way around. (contrary to what you concluded) That's why I think we have a problem here. Now I'm gonna open a can of worms: There are more instances of this restriction for other functions, like long long helper functions, the one-time construction API and such. In case of ldivmod and udivmod it is explicitly stated that these *have* full [AAPCS] privileges. Functions written in assembly should be safe, but c functions are a risk. compiler-rt seems to use __attribute__((pcs("aapcs"))) for functions that are implemented in C. That still does not guarantee that no floating-point registers are corrupted! Just compile this function and you will see what I mean: __attribute__((pcs("aapcs"))) double fn_aapcs (double a, double b) {return (a+b);} There are quite some candidates that might violate the Run-Time ABI rules. The only solution that I see is that LLVM must treat all eabi functions as normal functions. It should assume floating-point scratch registers could be corrupted even by functions that are defined not to do so. And LLVM must never try to be smart and use float registers for no-float code, and be it as handy scratch registers for inlined copy operations and such. Even if compiler-rt did not have a problem, I have seen too many libraries implement these functions in C. The one-time construction API is an example (__cxa_guard_acquire and friends). These functions are typically written in C and call external functions. (mutexes, sleeping) If any of these functions messes any float registers, then things are gonna be fucked up. Hopefully LLVM simply treats eabi functions as normal C-functions regardless of the Run-time ABI restrictions. And LLVM never touches floating-point registers - not even the scratch registers unless float operations are performed. Then things might work. Someone should mark the relevant code in LLVM, so that this is not changed in the future - ever. What really makes we sweat is that if there is a problem, then it will only bite under obscure circumstances and will be very hard to identify. Please, please tell me I'm wrong and overlooked something completely. Peter Am 11.04.2016 um 11:43 schrieb Renato Golin:> On 10 April 2016 at 20:37, Peter Jakubek via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> I recognized that compiler-rt's the implementation of __aeabi_memcpy simply >> branches to memcpy. >> The implementation of memcpy is not provided. So an externally provided >> memcpy () has to be used. >> (also applies to memmove, memset, memclr) > > Hi Peter, > > In a nutshell, Compiler-RT may assume there is a C library underneath. > This is the same for LibGCC > (https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html): > > "GCC will also generate calls to C library routines, such as > memcpy and memset, in some cases." > > This also works on free-standing environments (ex. the Linux kernel) > because those environments assume the compiler library will do so, and > thus implement "memcpy", "memset", etc. > > >> On ARM I have seen implementations of memcpy () using floating-point >> registers (if compiled with NEON support). The is perfectly >> legal, as memcpy () only needs to comply with the Procedure Call Standard. >> According to this d0-d7, d16-d31 are scratch registers. > > RT *could* have some optimised versions of those functions, but as you > can imagine, this is not a trivial pursuit. One would need to take > into account all the variations, alignment, ISA support and > micro-architecture differences, and that's a very large project, even > if we just consider ARM targets. > > Given that most C libraries have had that consideration already, and > have done more tests (conformance and performance) than we have, it's > safe to assume that whatever the C libraries did is probably better > (average case) than any "smart" thing we can conjure in a weekend. > > >> The situation is slightly different for __aeabi_memcpy (). The ABI spec >> explicitly states: "In general, implementations of these >> functions are allowed to corrupt only the integer core registers permitted >> to be corrupted by the [AAPCS] (r0-r3, ip, lr, and CPSR)." > > This is standard AAPCS, I trully hope neither glibc, newlib or musl's > implementations will corrupt anything more. > > >> newlib addresses this by explicitly providing a separate implementation for >> __aeabi_memcpy () and memcpy (). >> But things can get messed up if compiler-rt's __aeabi_memcpy () is actually >> used. > > Indeed, this is the real problem (not the FP clobbering, the two > libc's versions). This will be a linker nightmare, and could break the > programmers assumption by using a specific C library. > > Furthermore, EABI's implementation returns void, while the C standard > version returns the pointer, so they're not completely replaceable for > every case. > > So, it's safe to use memcpy for all calls to __eabi_memcpy (modulo > AAPCS bugs, performance), but the other way around is not. > > >> The implementation of __aeabi_memcpy () in ARM's compiler 5 tool-chain also >> uses floating-point registers - but preserves the contents. >> So this one is ABI compliant. > > Any function that is AAPCS compliant *must* preserve all but the AAPCS > registers. If one doesn't, it's not Compiler-RT's fault to assume so. > > >> My conclusion is that the current implementation of compiler-rt can >> potentially introduce difficult to track down problems. >> Unless of course LLVM always can handle corrupted floating-point registers >> for all calls to __aeabi_memcpy (). >> >> Either the aeabi variants of memcpy, memmove, memset, memclr should be fixed >> or the stubs should removed from compiler-rt. > > I agree. But it's not that simple. > > Some environments (ex. Linux kernel, Android, FreeBSD?, OSX?) depend > on this behaviour, and as much as I agree with you that this is a > broken assumption, we can't just remove it from RT. > > To make matters more complicated, different environments will update > RT at different times, and any change we do will have impact for years > to come, and will be hard to fix in the best way for everyone, quickly > enough. > > I believe one quick way to "fix" the multiple-versions is to mark RT's > version as weak (or equivalent), so that the C library's (or > free-standing's) version always gets picked first. But it may be an > over simplification from my part... > > I may also be underestimating the register clobbering problem, so if > you could give me a concrete example where it's ok for the library to > corrupt non-AAPCS registers, we could further discuss it. > > The only case I remember in GCC, was that it was spilling to VFP > registers (before stack) when they existed in the platform, but this > was a bad idea and was reverted promptly, because it was breaking > stack unwinding, context-switching, etc. > > cheers, > --renato >
Maybe Matching Threads
- compler-rt, __aeabi_memcpy () possibly broken (ARM)
- [LLVMdev] Prevent clang from replacing code with library calls
- [LLVMdev] Prevent clang from replacing code with library calls
- Best Practices recommendation on x4200
- [PATCH v2 0/2] Implement VFP context switch for arm32