Adrian Moreno via llvm-dev
2021-Dec-22 07:33 UTC
[llvm-dev] broken C code only when optimized "-O2"
On 12/21/21 19:25, Dimitry Andric wrote:> On 21 Dec 2021, at 17:30, Adrian Moreno via llvm-dev <llvm-dev at lists.llvm.org> wrote: >> >> I need some help understanding what might be wrong with a piece of code from the openvswitch project. By ${subject} I'm not suggesting there's a problem in clang, gcc also shows the same behavior so it's likely our code is broken. I am kindly asking for help to understand/troubleshoot the problem. >> >> Summary: It seems that certain interaction between two main openvswitch data structures, when optimized ("-O2 -flto=auto") is broken. >> The two data structures are: >> >> hmap: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/hmap.h >> list: https://github.com/openvswitch/ovs/blob/master/include/openvswitch/list.h >> >> I've reproduced the problem outside of openvswitch daemon using a short C program (attached) >> >> Code snippet: >> >> struct bond { >> struct hmap members; >> }; >> >> struct member { >> struct hmap_node hmap_node; >> int order; >> struct ovs_list elem; >> }; >> >> int main() { >> int ret = 0; >> struct member *member, *member1, *member2; >> struct bond *bond; >> struct ovs_list start = {0}; >> >> bond = malloc(sizeof *bond); >> memset(bond, 0, sizeof (struct bond)); >> hmap_init(&bond->members); >> >> member1 = malloc(sizeof *member1); >> member2 = malloc(sizeof *member2); >> memset(member1, 0, sizeof (struct member)); >> memset(member2, 0, sizeof (struct member)); >> >> member1->order = 3; >> member2->order = 2; >> >> hmap_insert(&bond->members, &member1->hmap_node, (uint32_t)(uintptr_t)member1); >> hmap_insert(&bond->members, &member2->hmap_node, (uint32_t)(uintptr_t)member2); >> >> ovs_list_init(&start); >> HMAP_FOR_EACH (member, hmap_node, &bond->members) { >> /* >> * Insert member in start (sorted) >> * */ >> struct member *pos; >> LIST_FOR_EACH (pos, elem, &start) { >> if (member->order > pos->order) { >> break; >> } >> } >> // TESTED: If I add this printf, the problem disappears >> //printf("Inserting member: %p\n", member); >> ovs_list_insert(&pos->elem, &member->elem); >> } >> >> /* I've inserted two members into the 'start' list. >> * first and last have to be either member1 or member2 >> * */ >> if ((first != member1 && first != member2) || (last != member1 && last != member2)) { >> printf("list is broken!\n"); >> } >> >> } >> >> >> What I know for now: >> * -fno-strict-aliasing does not fix it >> * Only happens with "-O2 -flto=auto" >> * If I define 'ovs_list *start' and change the code to use the pointer directly and not '&start' the problem disappears. It seems that the LIST_FOR_EACH macros prefer an lvalue rather than "&" but I don't get why. >> * I'm not able to reproduce without using hmap _and_ ovs_list. >> * If I add a compiler barrier (or a call to an external function) after the loop, the problem disappears (e.g printf), the problem disappears. >> * If I add -fsanitize=undefined the problem disappears! > > Not for me: > > % clang -g -fsanitize=undefined -I/Users/dim/tmp/vswitch/foo/include example.c -o example -L/Users/dim/tmp/vswitch/foo/lib -lopenvswitch >Thanks for running it!> % ./example > start: 0x16ee6f618 > example.c:78:5: runtime error: applying zero offset to null pointer > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.c:78:5 in > example.c:78:5: runtime error: applying zero offset to null pointer > SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.c:78:5 in > first: 0x600003200270 > last: 0x6000032002a0 > > It looks like the HMAP_FOR_EACH() macro uses null pointer arithmetic. The problem appears to be in https://github.com/openvswitch/ovs/blob/master/include/openvswitch/util.h#L94 : > > /* Given OBJECT of type pointer-to-structure, expands to the offset of MEMBER > * within an instance of the structure. > * > * The GCC-specific version avoids the technicality of undefined behavior if > * OBJECT is null, invalid, or not yet initialized. This makes some static > * checkers (like Coverity) happier. But the non-GCC version does not actually > * dereference any pointer, so it would be surprising for it to cause any > * problems in practice. > */ > #ifdef __GNUC__ > #define OBJECT_OFFSETOF(OBJECT, MEMBER) offsetof(typeof(*(OBJECT)), MEMBER) > #else > #define OBJECT_OFFSETOF(OBJECT, MEMBER) \ > ((char *) &(OBJECT)->MEMBER - (char *) (OBJECT)) > #endif > > The comment is incorrect here, because dereferencing a null pointer, as done in *(OBJECT), is definitely undefined behavior. >Thanks Dimitry for spotting it. So you were running in a non GNUC system? I was testing on Fedora so we were using different implementations of this macro. I will raise it with the openvswitch community that the non GNUC is being flagged by UBSan as an error. For the _GNUC_ version, is using *(OBJECT) inside the "typeof" operand also undefined behavior?> -Dimitry >-- Adrián Moreno
Dimitry Andric via llvm-dev
2021-Dec-22 12:10 UTC
[llvm-dev] broken C code only when optimized "-O2"
On 22 Dec 2021, at 08:33, Adrian Moreno <amorenoz at redhat.com> wrote:> > On 12/21/21 19:25, Dimitry Andric wrote: >> On 21 Dec 2021, at 17:30, Adrian Moreno via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>> >>> I need some help understanding what might be wrong with a piece of code from the openvswitch project. By ${subject} I'm not suggesting there's a problem in clang, gcc also shows the same behavior so it's likely our code is broken. I am kindly asking for help to understand/troubleshoot the problem....>> % ./example >> start: 0x16ee6f618 >> example.c:78:5: runtime error: applying zero offset to null pointer >> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.c:78:5 in >> example.c:78:5: runtime error: applying zero offset to null pointer >> SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior example.c:78:5 in >> first: 0x600003200270 >> last: 0x6000032002a0 >> It looks like the HMAP_FOR_EACH() macro uses null pointer arithmetic. The problem appears to be in https://github.com/openvswitch/ovs/blob/master/include/openvswitch/util.h#L94 : >> /* Given OBJECT of type pointer-to-structure, expands to the offset of MEMBER >> * within an instance of the structure. >> * >> * The GCC-specific version avoids the technicality of undefined behavior if >> * OBJECT is null, invalid, or not yet initialized. This makes some static >> * checkers (like Coverity) happier. But the non-GCC version does not actually >> * dereference any pointer, so it would be surprising for it to cause any >> * problems in practice. >> */ >> #ifdef __GNUC__ >> #define OBJECT_OFFSETOF(OBJECT, MEMBER) offsetof(typeof(*(OBJECT)), MEMBER) >> #else >> #define OBJECT_OFFSETOF(OBJECT, MEMBER) \ >> ((char *) &(OBJECT)->MEMBER - (char *) (OBJECT)) >> #endif >> The comment is incorrect here, because dereferencing a null pointer, as done in *(OBJECT), is definitely undefined behavior. > > Thanks Dimitry for spotting it. > > So you were running in a non GNUC system? I was testing on Fedora so we were using different implementations of this macro. I will raise it with the openvswitch community that the non GNUC is being flagged by UBSan as an error.Yes, I have run this on macOS 12.1 with Apple clang-1300.0.29.30, and on FreeBSD 14-CURRENT, with clang 13.0.0. In both cases, turning optimization on or off does not make a difference for the UBSan reports. I only added -g to be able to debug. Note also that neither on macOS nor on FreeBSD do I ever get the program to print "list is broken!". It does not matter which optimization level I choose. So here you see that the system's memory layout will likely have an influence on the result. Since you told us that adding LTO flags makes the problem appear, maybe this is related? I know that on FreeBSD, none of the libraries you link in are stored in LLVM bitcode (yet, at least:), and I suspect that this is also the case on macOS. I attached another version of the example, with the external functions from openvswitch inlined, so it should only depend on system headers, and won't need any external libraries. Maybe this makes it easier to reason about the problem. Again, on macOS and FreeBSD this shows UBSan messages, but never "list is broken!".> For the _GNUC_ version, is using *(OBJECT) inside the "typeof" operand also undefined behavior?(Note that clang also defines __GNUC__ so this block will be used for clang too.) If OBJECT ever becomes a null pointer, as in the example, then it is undefined. And I don't think an argument "I'm only doing a typeof(*nullptr), not really accessing it" works... :) -Dimitry -------------- next part -------------- A non-text attachment was scrubbed... Name: example-inlined.c Type: application/octet-stream Size: 10877 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211222/d824ba1c/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 223 bytes Desc: Message signed with OpenPGP URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20211222/d824ba1c/attachment.sig>