Christian Ehrhardt
2021-Mar-31  06:46 UTC
json_parse_number broken by compiler optimization
On Tue, Mar 30, 2021 at 9:21 PM Josef 'Jeff' Sipek <jeff.sipek at open-xchange.com> wrote:> > On Tue, Mar 30, 2021 at 13:34:54 -0400, Josef 'Jeff' Sipek wrote: > > On Tue, Mar 30, 2021 at 17:53:27 +0200, Christian Ehrhardt wrote: > > > Hi, > > > the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1 > > > build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz > > > A coworker tried 2.3.14 but got the same result. > > > > > > What fails is the json_parser build time test like: > > > test-json-parser.c:161: Assert(#25) failed: > > > null_strcmp(json_output[pos].value, value) == 0 > > > > > > I was looking into that a bit more and what I found is that it is > > > dependent on the new toolchain > > > of gcc 10.2.0-1. > > > > FWIW, I managed to reproduce it on FreeBSD with gcc 11, so the good news for > > you is that it isn't Ubuntu specific :) > > > > I'll debug further. > > The culprit seems to be LTO. If you disable LTO, everything should work > just fine.I've had LTO disabled and it has still shown the same effect (with my gcc 10.2.0-1). I'll give it a non-LTO retry and double check if it really changed the compile options accordingly. I'll let you know about that later on.> So, I think that'll be the "official" workaround - and a much > better one than disabling optimization completely.Well, "completely" is a bit hard, as I only disabled it on a single function and not the full build :-) But yeah if it really turns out to be LTO then disabling that will be fine as an avoidance until we've found the underlying root cause.> Now, the big question is, is something in the test breaking or is the parser > itself somehow triggering this. > > Jeff. > > > > > Thanks again for the report, > > > > Jeff. > > > > > > > > Not all calls to json_parse_* fail, e.g. the first one looks all good and passes > > > I was iterating the tests using a report function defined like > > > > > > (gdb) define repcon > > > >c > > > >p pos > > > >p json_output[pos].type > > > >p type > > > >p json_output[pos].value > > > >p value > > > >call null_strcmp(json_output[pos].value, value) > > > >end > > > > > > The first one to be bad was: > > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > > > test-json-parser.c:161 > > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > $84 = 25 > > > $85 = JSON_TYPE_NUMBER > > > $86 = JSON_TYPE_NUMBER > > > $87 = 0x555555633b25 "-12.456" > > > $88 = 0x555555693110 "" > > > $89 = 45 > > > > > > Earlier and later parsing was happy, for example > > > > > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > > > test-json-parser.c:161 > > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > $90 = 27 > > > $91 = JSON_TYPE_NUMBER > > > $92 = JSON_TYPE_NUMBER > > > $93 = 0x555555633b32 "12.456e9" > > > $94 = 0x555555693110 "12.456e9" > > > $95 = 0 > > > (gdb) > > > > > > > > > We have two things we compare here. > > > 1. json_output[] which is a static define and for this value is > > > 67 ????{ JSON_TYPE_NUMBER, "-12.456" }, > > > 2. the return value that json_parse_next returns. > > > 25 ????" \"sub2\":-12.456,\n" > > > 148 ????????????????ret = json_parse_next(parser, &type, &value); > > > > > > I tried a non negative number and got success which is suspicious > > > > > > Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at > > > test-json-parser.c:164 > > > 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > $122 = 25 > > > $123 = JSON_TYPE_NUMBER > > > $124 = JSON_TYPE_NUMBER > > > $125 = 0x555555633b2c "12.456" > > > $126 = 0x555555693110 "12.456" > > > $127 = 0 > > > (gdb) > > > > > > > > > Also the return value otherwise LGTM, it is recognized as a number: > > > 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) { > > > (gdb) n > > > 541 *type_r = JSON_TYPE_NUMBER; > > > > > > So parsing worked one might think, but the value was not set. > > > > > > (gdb) p *value_r > > > $833 = 0x555555693110 "" > > > > > > It turns out json_parse_number is too optimized. > > > Too optimized for debugging further into json_parse_number. > > > But also optimization is causing the issue. > > > With -O0 or -O1 it works fine, even putting the lower optimization > > > level just around json_parse_number it is enough to avoid the issue. > > > > > > > > > Not a fix but "avoidance" for now: > > > > > > --- a/src/lib/json-parser.c > > > +++ b/src/lib/json-parser.c > > > @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa > > > return 1; > > > } > > > > > > +#pragma GCC push_options > > > +#pragma GCC optimize ("-O0") > > > static int json_parse_number(struct json_parser *parser, const char **value_r) > > > { > > > int ret; > > > @@ -415,6 +417,7 @@ static int json_parse_number(struct json > > > *value_r = str_c(parser->value); > > > return 1; > > > } > > > +#pragma GCC pop_options > > > > > > static int json_parse_atom(struct json_parser *parser, const char *atom) > > > { > > > > > > > > > I was trying to see what happened and tracked each str_append_c > > > that happened inside json_parse_number. > > > A breakpoint on str_append_c and backtraces showed that we clearly do the > > > "right" parsing. From the backtraces I've seen it does (just from positions): > > > > > > For example > > > (gdb) bt > > > #0 str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 > > > #1 json_parse_digits (parser=<optimized out>) at json-parser.c:366 > > > => that is a digit > > > > > > It does "- digit digit dot digit digit digit" > > > > > > This perfectly matches the input value, but still the result stays empty: > > > (gdb) p (char*)(str->data) > > > $738 = 0x555555693110 "" > > > > > > In comparison - when parsing the latter 12.456e9 which works I see two > > > odd things: > > > 1. the value isn't cleared despite > > > str_truncate(parser->value, 0); > > > at the beginning of json_parse_number > > > 2. the value (this time) is read exactly as I assumed > > > > > > Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0, > > > parser=0x555555692ee0) at json-parser.c:394 > > > 394 str_truncate(parser->value, 0); > > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > > > at str.h:44 > > > 44 buffer_append_c(str, chr); > > > (gdb) p (char*)(str->data) > > > $741 = 0x555555693110 "sub3" > > > > > > ^^ not cleared, still has "sub3" even later on > > > > > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > > > at str.h:44 > > > 44 buffer_append_c(str, chr); > > > (gdb) p (char*)(str->data) > > > $742 = 0x555555693110 "1ub3" > > > > > > ^^ reading left to right one char at a time "over the old value", > > > which didn't update the data. > > > > > > So it seems in json_parse_number the ordering of > > > a) str_truncate(parser->value, 0); > > > b) the parsing itself > > > is messed up by optimization. > > > > > > We already know from the above that reducing just this function to -O1 or less > > > avoids the issue. But what is it really - I don't know? > > > > > > Maybe you have a better idea what is going on, the issue should be > > > reproducible when building > > > dovecot with the toolchain present on the latest Ubuntu being 21.04 > > > (Hirsute) right now. > > > > > > -- > > > Christian Ehrhardt > > > Staff Engineer, Ubuntu Server > > > Canonical Ltd-- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd
Christian Ehrhardt
2021-Mar-31  07:07 UTC
json_parse_number broken by compiler optimization
On Wed, Mar 31, 2021 at 8:46 AM Christian Ehrhardt <christian.ehrhardt at canonical.com> wrote:> > On Tue, Mar 30, 2021 at 9:21 PM Josef 'Jeff' Sipek > <jeff.sipek at open-xchange.com> wrote: > > > > On Tue, Mar 30, 2021 at 13:34:54 -0400, Josef 'Jeff' Sipek wrote: > > > On Tue, Mar 30, 2021 at 17:53:27 +0200, Christian Ehrhardt wrote: > > > > Hi, > > > > the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1 > > > > build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz > > > > A coworker tried 2.3.14 but got the same result. > > > > > > > > What fails is the json_parser build time test like: > > > > test-json-parser.c:161: Assert(#25) failed: > > > > null_strcmp(json_output[pos].value, value) == 0 > > > > > > > > I was looking into that a bit more and what I found is that it is > > > > dependent on the new toolchain > > > > of gcc 10.2.0-1. > > > > > > FWIW, I managed to reproduce it on FreeBSD with gcc 11, so the good news for > > > you is that it isn't Ubuntu specific :) > > > > > > I'll debug further. > > > > The culprit seems to be LTO. If you disable LTO, everything should work > > just fine. > > I've had LTO disabled and it has still shown the same effect (with my > gcc 10.2.0-1). > I'll give it a non-LTO retry and double check if it really changed the > compile options accordingly. > I'll let you know about that later on.Indeed, I wonder what I tried yesterday in regard to LTO then .. :-/ I can confirm that disabling LTO fixes the issue for me as well and for now that should be a good mitigation until the root cause is found and fixed. Since it might help debugging the underlying problem with LTO here is another data point. With LTO enabled (and skipping the json-parser issues with my optimization trick) there is another testcase later that fails (but works with LTO disabled): test-istream-attachment.c:354: Assert failed: memcmp(data + sizeof(BINARY_TEXT_LONG)-1, BINARY_TEXT_SHORT, strlen(BINARY_TEXT_SHORT)) == 0 istream attachment ................................................... : FAILED Panic: file test-istream-attachment.c: line 395 (test_istream_attachment_extractor_one): assertion failed: (size >prefix_len && memcmp(data, mail_broken_input_body_prefix, prefix_len) == 0) Error: Raw backtrace: ./test-istream-attachment(+0x4cd95) [0x55c0db91bd95] -> ./test-istream-attachment(backtrace_get+0x75) [0x55c0db91bf65] -> ./test-istream-attachment(+0x2a7fb) [0x55c0db8f97fb] -> ./test-istream-attachment(+0x2a837) [0x55c0db8f9837] -> ./test-istream-attachment(+0x13c5c) [0x55c0db8e2c5c] -> ./test-istream-attachment(+0x12d39) [0x55c0db8e1d39] -> ./test-istream-attachment(+0x1cca3) [0x55c0db8ebca3] -> ./test-istream-attachment(+0x2424d) [0x55c0db8f324d] -> ./test-istream-attachment(test_run+0x63) [0x55c0db8f32f3] -> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5) [0x7f60d232d565] -> ./test-istream-attachment(_start+0x2e) [0x55c0db8e7c2e] /bin/bash: line 1: 1650909 Aborted (core dumped) ./$bin> > So, I think that'll be the "official" workaround - and a much > > better one than disabling optimization completely. > > Well, "completely" is a bit hard, as I only disabled it on a single > function and not the full build :-) > But yeah if it really turns out to be LTO then disabling that will be > fine as an avoidance until we've found the underlying root cause. > > > Now, the big question is, is something in the test breaking or is the parser > > itself somehow triggering this. > > > > Jeff. > > > > > > > > Thanks again for the report, > > > > > > Jeff. > > > > > > > > > > > Not all calls to json_parse_* fail, e.g. the first one looks all good and passes > > > > I was iterating the tests using a report function defined like > > > > > > > > (gdb) define repcon > > > > >c > > > > >p pos > > > > >p json_output[pos].type > > > > >p type > > > > >p json_output[pos].value > > > > >p value > > > > >call null_strcmp(json_output[pos].value, value) > > > > >end > > > > > > > > The first one to be bad was: > > > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > > > > test-json-parser.c:161 > > > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > > $84 = 25 > > > > $85 = JSON_TYPE_NUMBER > > > > $86 = JSON_TYPE_NUMBER > > > > $87 = 0x555555633b25 "-12.456" > > > > $88 = 0x555555693110 "" > > > > $89 = 45 > > > > > > > > Earlier and later parsing was happy, for example > > > > > > > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > > > > test-json-parser.c:161 > > > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > > $90 = 27 > > > > $91 = JSON_TYPE_NUMBER > > > > $92 = JSON_TYPE_NUMBER > > > > $93 = 0x555555633b32 "12.456e9" > > > > $94 = 0x555555693110 "12.456e9" > > > > $95 = 0 > > > > (gdb) > > > > > > > > > > > > We have two things we compare here. > > > > 1. json_output[] which is a static define and for this value is > > > > 67 ????{ JSON_TYPE_NUMBER, "-12.456" }, > > > > 2. the return value that json_parse_next returns. > > > > 25 ????" \"sub2\":-12.456,\n" > > > > 148 ????????????????ret = json_parse_next(parser, &type, &value); > > > > > > > > I tried a non negative number and got success which is suspicious > > > > > > > > Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at > > > > test-json-parser.c:164 > > > > 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > > $122 = 25 > > > > $123 = JSON_TYPE_NUMBER > > > > $124 = JSON_TYPE_NUMBER > > > > $125 = 0x555555633b2c "12.456" > > > > $126 = 0x555555693110 "12.456" > > > > $127 = 0 > > > > (gdb) > > > > > > > > > > > > Also the return value otherwise LGTM, it is recognized as a number: > > > > 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) { > > > > (gdb) n > > > > 541 *type_r = JSON_TYPE_NUMBER; > > > > > > > > So parsing worked one might think, but the value was not set. > > > > > > > > (gdb) p *value_r > > > > $833 = 0x555555693110 "" > > > > > > > > It turns out json_parse_number is too optimized. > > > > Too optimized for debugging further into json_parse_number. > > > > But also optimization is causing the issue. > > > > With -O0 or -O1 it works fine, even putting the lower optimization > > > > level just around json_parse_number it is enough to avoid the issue. > > > > > > > > > > > > Not a fix but "avoidance" for now: > > > > > > > > --- a/src/lib/json-parser.c > > > > +++ b/src/lib/json-parser.c > > > > @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa > > > > return 1; > > > > } > > > > > > > > +#pragma GCC push_options > > > > +#pragma GCC optimize ("-O0") > > > > static int json_parse_number(struct json_parser *parser, const char **value_r) > > > > { > > > > int ret; > > > > @@ -415,6 +417,7 @@ static int json_parse_number(struct json > > > > *value_r = str_c(parser->value); > > > > return 1; > > > > } > > > > +#pragma GCC pop_options > > > > > > > > static int json_parse_atom(struct json_parser *parser, const char *atom) > > > > { > > > > > > > > > > > > I was trying to see what happened and tracked each str_append_c > > > > that happened inside json_parse_number. > > > > A breakpoint on str_append_c and backtraces showed that we clearly do the > > > > "right" parsing. From the backtraces I've seen it does (just from positions): > > > > > > > > For example > > > > (gdb) bt > > > > #0 str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 > > > > #1 json_parse_digits (parser=<optimized out>) at json-parser.c:366 > > > > => that is a digit > > > > > > > > It does "- digit digit dot digit digit digit" > > > > > > > > This perfectly matches the input value, but still the result stays empty: > > > > (gdb) p (char*)(str->data) > > > > $738 = 0x555555693110 "" > > > > > > > > In comparison - when parsing the latter 12.456e9 which works I see two > > > > odd things: > > > > 1. the value isn't cleared despite > > > > str_truncate(parser->value, 0); > > > > at the beginning of json_parse_number > > > > 2. the value (this time) is read exactly as I assumed > > > > > > > > Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0, > > > > parser=0x555555692ee0) at json-parser.c:394 > > > > 394 str_truncate(parser->value, 0); > > > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > > > > at str.h:44 > > > > 44 buffer_append_c(str, chr); > > > > (gdb) p (char*)(str->data) > > > > $741 = 0x555555693110 "sub3" > > > > > > > > ^^ not cleared, still has "sub3" even later on > > > > > > > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > > > > at str.h:44 > > > > 44 buffer_append_c(str, chr); > > > > (gdb) p (char*)(str->data) > > > > $742 = 0x555555693110 "1ub3" > > > > > > > > ^^ reading left to right one char at a time "over the old value", > > > > which didn't update the data. > > > > > > > > So it seems in json_parse_number the ordering of > > > > a) str_truncate(parser->value, 0); > > > > b) the parsing itself > > > > is messed up by optimization. > > > > > > > > We already know from the above that reducing just this function to -O1 or less > > > > avoids the issue. But what is it really - I don't know? > > > > > > > > Maybe you have a better idea what is going on, the issue should be > > > > reproducible when building > > > > dovecot with the toolchain present on the latest Ubuntu being 21.04 > > > > (Hirsute) right now. > > > > > > > > -- > > > > Christian Ehrhardt > > > > Staff Engineer, Ubuntu Server > > > > Canonical Ltd > > > > -- > Christian Ehrhardt > Staff Engineer, Ubuntu Server > Canonical Ltd-- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd