Josef 'Jeff' Sipek
2021-Mar-30 19:21 UTC
json_parse_number broken by compiler optimization
On Tue, Mar 30, 2021 at 13:34:54 -0400, Josef 'Jeff' Sipek wrote:> On Tue, Mar 30, 2021 at 17:53:27 +0200, Christian Ehrhardt wrote: > > Hi, > > the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1 > > build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz > > A coworker tried 2.3.14 but got the same result. > > > > What fails is the json_parser build time test like: > > test-json-parser.c:161: Assert(#25) failed: > > null_strcmp(json_output[pos].value, value) == 0 > > > > I was looking into that a bit more and what I found is that it is > > dependent on the new toolchain > > of gcc 10.2.0-1. > > FWIW, I managed to reproduce it on FreeBSD with gcc 11, so the good news for > you is that it isn't Ubuntu specific :) > > I'll debug further.The culprit seems to be LTO. If you disable LTO, everything should work just fine. So, I think that'll be the "official" workaround - and a much better one than disabling optimization completely. Now, the big question is, is something in the test breaking or is the parser itself somehow triggering this. Jeff.> > Thanks again for the report, > > Jeff. > > > > > Not all calls to json_parse_* fail, e.g. the first one looks all good and passes > > I was iterating the tests using a report function defined like > > > > (gdb) define repcon > > >c > > >p pos > > >p json_output[pos].type > > >p type > > >p json_output[pos].value > > >p value > > >call null_strcmp(json_output[pos].value, value) > > >end > > > > The first one to be bad was: > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > > test-json-parser.c:161 > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > $84 = 25 > > $85 = JSON_TYPE_NUMBER > > $86 = JSON_TYPE_NUMBER > > $87 = 0x555555633b25 "-12.456" > > $88 = 0x555555693110 "" > > $89 = 45 > > > > Earlier and later parsing was happy, for example > > > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > > test-json-parser.c:161 > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > $90 = 27 > > $91 = JSON_TYPE_NUMBER > > $92 = JSON_TYPE_NUMBER > > $93 = 0x555555633b32 "12.456e9" > > $94 = 0x555555693110 "12.456e9" > > $95 = 0 > > (gdb) > > > > > > We have two things we compare here. > > 1. json_output[] which is a static define and for this value is > > 67 ????{ JSON_TYPE_NUMBER, "-12.456" }, > > 2. the return value that json_parse_next returns. > > 25 ????" \"sub2\":-12.456,\n" > > 148 ????????????????ret = json_parse_next(parser, &type, &value); > > > > I tried a non negative number and got success which is suspicious > > > > Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at > > test-json-parser.c:164 > > 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > $122 = 25 > > $123 = JSON_TYPE_NUMBER > > $124 = JSON_TYPE_NUMBER > > $125 = 0x555555633b2c "12.456" > > $126 = 0x555555693110 "12.456" > > $127 = 0 > > (gdb) > > > > > > Also the return value otherwise LGTM, it is recognized as a number: > > 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) { > > (gdb) n > > 541 *type_r = JSON_TYPE_NUMBER; > > > > So parsing worked one might think, but the value was not set. > > > > (gdb) p *value_r > > $833 = 0x555555693110 "" > > > > It turns out json_parse_number is too optimized. > > Too optimized for debugging further into json_parse_number. > > But also optimization is causing the issue. > > With -O0 or -O1 it works fine, even putting the lower optimization > > level just around json_parse_number it is enough to avoid the issue. > > > > > > Not a fix but "avoidance" for now: > > > > --- a/src/lib/json-parser.c > > +++ b/src/lib/json-parser.c > > @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa > > return 1; > > } > > > > +#pragma GCC push_options > > +#pragma GCC optimize ("-O0") > > static int json_parse_number(struct json_parser *parser, const char **value_r) > > { > > int ret; > > @@ -415,6 +417,7 @@ static int json_parse_number(struct json > > *value_r = str_c(parser->value); > > return 1; > > } > > +#pragma GCC pop_options > > > > static int json_parse_atom(struct json_parser *parser, const char *atom) > > { > > > > > > I was trying to see what happened and tracked each str_append_c > > that happened inside json_parse_number. > > A breakpoint on str_append_c and backtraces showed that we clearly do the > > "right" parsing. From the backtraces I've seen it does (just from positions): > > > > For example > > (gdb) bt > > #0 str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 > > #1 json_parse_digits (parser=<optimized out>) at json-parser.c:366 > > => that is a digit > > > > It does "- digit digit dot digit digit digit" > > > > This perfectly matches the input value, but still the result stays empty: > > (gdb) p (char*)(str->data) > > $738 = 0x555555693110 "" > > > > In comparison - when parsing the latter 12.456e9 which works I see two > > odd things: > > 1. the value isn't cleared despite > > str_truncate(parser->value, 0); > > at the beginning of json_parse_number > > 2. the value (this time) is read exactly as I assumed > > > > Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0, > > parser=0x555555692ee0) at json-parser.c:394 > > 394 str_truncate(parser->value, 0); > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > > at str.h:44 > > 44 buffer_append_c(str, chr); > > (gdb) p (char*)(str->data) > > $741 = 0x555555693110 "sub3" > > > > ^^ not cleared, still has "sub3" even later on > > > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > > at str.h:44 > > 44 buffer_append_c(str, chr); > > (gdb) p (char*)(str->data) > > $742 = 0x555555693110 "1ub3" > > > > ^^ reading left to right one char at a time "over the old value", > > which didn't update the data. > > > > So it seems in json_parse_number the ordering of > > a) str_truncate(parser->value, 0); > > b) the parsing itself > > is messed up by optimization. > > > > We already know from the above that reducing just this function to -O1 or less > > avoids the issue. But what is it really - I don't know? > > > > Maybe you have a better idea what is going on, the issue should be > > reproducible when building > > dovecot with the toolchain present on the latest Ubuntu being 21.04 > > (Hirsute) right now. > > > > -- > > Christian Ehrhardt > > Staff Engineer, Ubuntu Server > > Canonical Ltd
Christian Ehrhardt
2021-Mar-31 06:46 UTC
json_parse_number broken by compiler optimization
On Tue, Mar 30, 2021 at 9:21 PM Josef 'Jeff' Sipek <jeff.sipek at open-xchange.com> wrote:> > On Tue, Mar 30, 2021 at 13:34:54 -0400, Josef 'Jeff' Sipek wrote: > > On Tue, Mar 30, 2021 at 17:53:27 +0200, Christian Ehrhardt wrote: > > > Hi, > > > the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1 > > > build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz > > > A coworker tried 2.3.14 but got the same result. > > > > > > What fails is the json_parser build time test like: > > > test-json-parser.c:161: Assert(#25) failed: > > > null_strcmp(json_output[pos].value, value) == 0 > > > > > > I was looking into that a bit more and what I found is that it is > > > dependent on the new toolchain > > > of gcc 10.2.0-1. > > > > FWIW, I managed to reproduce it on FreeBSD with gcc 11, so the good news for > > you is that it isn't Ubuntu specific :) > > > > I'll debug further. > > The culprit seems to be LTO. If you disable LTO, everything should work > just fine.I've had LTO disabled and it has still shown the same effect (with my gcc 10.2.0-1). I'll give it a non-LTO retry and double check if it really changed the compile options accordingly. I'll let you know about that later on.> So, I think that'll be the "official" workaround - and a much > better one than disabling optimization completely.Well, "completely" is a bit hard, as I only disabled it on a single function and not the full build :-) But yeah if it really turns out to be LTO then disabling that will be fine as an avoidance until we've found the underlying root cause.> Now, the big question is, is something in the test breaking or is the parser > itself somehow triggering this. > > Jeff. > > > > > Thanks again for the report, > > > > Jeff. > > > > > > > > Not all calls to json_parse_* fail, e.g. the first one looks all good and passes > > > I was iterating the tests using a report function defined like > > > > > > (gdb) define repcon > > > >c > > > >p pos > > > >p json_output[pos].type > > > >p type > > > >p json_output[pos].value > > > >p value > > > >call null_strcmp(json_output[pos].value, value) > > > >end > > > > > > The first one to be bad was: > > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > > > test-json-parser.c:161 > > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > $84 = 25 > > > $85 = JSON_TYPE_NUMBER > > > $86 = JSON_TYPE_NUMBER > > > $87 = 0x555555633b25 "-12.456" > > > $88 = 0x555555693110 "" > > > $89 = 45 > > > > > > Earlier and later parsing was happy, for example > > > > > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > > > test-json-parser.c:161 > > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > $90 = 27 > > > $91 = JSON_TYPE_NUMBER > > > $92 = JSON_TYPE_NUMBER > > > $93 = 0x555555633b32 "12.456e9" > > > $94 = 0x555555693110 "12.456e9" > > > $95 = 0 > > > (gdb) > > > > > > > > > We have two things we compare here. > > > 1. json_output[] which is a static define and for this value is > > > 67 ????{ JSON_TYPE_NUMBER, "-12.456" }, > > > 2. the return value that json_parse_next returns. > > > 25 ????" \"sub2\":-12.456,\n" > > > 148 ????????????????ret = json_parse_next(parser, &type, &value); > > > > > > I tried a non negative number and got success which is suspicious > > > > > > Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at > > > test-json-parser.c:164 > > > 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > > > $122 = 25 > > > $123 = JSON_TYPE_NUMBER > > > $124 = JSON_TYPE_NUMBER > > > $125 = 0x555555633b2c "12.456" > > > $126 = 0x555555693110 "12.456" > > > $127 = 0 > > > (gdb) > > > > > > > > > Also the return value otherwise LGTM, it is recognized as a number: > > > 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) { > > > (gdb) n > > > 541 *type_r = JSON_TYPE_NUMBER; > > > > > > So parsing worked one might think, but the value was not set. > > > > > > (gdb) p *value_r > > > $833 = 0x555555693110 "" > > > > > > It turns out json_parse_number is too optimized. > > > Too optimized for debugging further into json_parse_number. > > > But also optimization is causing the issue. > > > With -O0 or -O1 it works fine, even putting the lower optimization > > > level just around json_parse_number it is enough to avoid the issue. > > > > > > > > > Not a fix but "avoidance" for now: > > > > > > --- a/src/lib/json-parser.c > > > +++ b/src/lib/json-parser.c > > > @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa > > > return 1; > > > } > > > > > > +#pragma GCC push_options > > > +#pragma GCC optimize ("-O0") > > > static int json_parse_number(struct json_parser *parser, const char **value_r) > > > { > > > int ret; > > > @@ -415,6 +417,7 @@ static int json_parse_number(struct json > > > *value_r = str_c(parser->value); > > > return 1; > > > } > > > +#pragma GCC pop_options > > > > > > static int json_parse_atom(struct json_parser *parser, const char *atom) > > > { > > > > > > > > > I was trying to see what happened and tracked each str_append_c > > > that happened inside json_parse_number. > > > A breakpoint on str_append_c and backtraces showed that we clearly do the > > > "right" parsing. From the backtraces I've seen it does (just from positions): > > > > > > For example > > > (gdb) bt > > > #0 str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 > > > #1 json_parse_digits (parser=<optimized out>) at json-parser.c:366 > > > => that is a digit > > > > > > It does "- digit digit dot digit digit digit" > > > > > > This perfectly matches the input value, but still the result stays empty: > > > (gdb) p (char*)(str->data) > > > $738 = 0x555555693110 "" > > > > > > In comparison - when parsing the latter 12.456e9 which works I see two > > > odd things: > > > 1. the value isn't cleared despite > > > str_truncate(parser->value, 0); > > > at the beginning of json_parse_number > > > 2. the value (this time) is read exactly as I assumed > > > > > > Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0, > > > parser=0x555555692ee0) at json-parser.c:394 > > > 394 str_truncate(parser->value, 0); > > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > > > at str.h:44 > > > 44 buffer_append_c(str, chr); > > > (gdb) p (char*)(str->data) > > > $741 = 0x555555693110 "sub3" > > > > > > ^^ not cleared, still has "sub3" even later on > > > > > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > > > at str.h:44 > > > 44 buffer_append_c(str, chr); > > > (gdb) p (char*)(str->data) > > > $742 = 0x555555693110 "1ub3" > > > > > > ^^ reading left to right one char at a time "over the old value", > > > which didn't update the data. > > > > > > So it seems in json_parse_number the ordering of > > > a) str_truncate(parser->value, 0); > > > b) the parsing itself > > > is messed up by optimization. > > > > > > We already know from the above that reducing just this function to -O1 or less > > > avoids the issue. But what is it really - I don't know? > > > > > > Maybe you have a better idea what is going on, the issue should be > > > reproducible when building > > > dovecot with the toolchain present on the latest Ubuntu being 21.04 > > > (Hirsute) right now. > > > > > > -- > > > Christian Ehrhardt > > > Staff Engineer, Ubuntu Server > > > Canonical Ltd-- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd