Christian Ehrhardt
2021-Mar-30 15:53 UTC
json_parse_number broken by compiler optimization
Hi, the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1 build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz A coworker tried 2.3.14 but got the same result. What fails is the json_parser build time test like: test-json-parser.c:161: Assert(#25) failed: null_strcmp(json_output[pos].value, value) == 0 I was looking into that a bit more and what I found is that it is dependent on the new toolchain of gcc 10.2.0-1. Not all calls to json_parse_* fail, e.g. the first one looks all good and passes I was iterating the tests using a report function defined like (gdb) define repcon>c >p pos >p json_output[pos].type >p type >p json_output[pos].value >p value >call null_strcmp(json_output[pos].value, value) >endThe first one to be bad was: Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at test-json-parser.c:161 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); $84 = 25 $85 = JSON_TYPE_NUMBER $86 = JSON_TYPE_NUMBER $87 = 0x555555633b25 "-12.456" $88 = 0x555555693110 "" $89 = 45 Earlier and later parsing was happy, for example Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at test-json-parser.c:161 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); $90 = 27 $91 = JSON_TYPE_NUMBER $92 = JSON_TYPE_NUMBER $93 = 0x555555633b32 "12.456e9" $94 = 0x555555693110 "12.456e9" $95 = 0 (gdb) We have two things we compare here. 1. json_output[] which is a static define and for this value is 67 ????{ JSON_TYPE_NUMBER, "-12.456" }, 2. the return value that json_parse_next returns. 25 ????" \"sub2\":-12.456,\n" 148 ????????????????ret = json_parse_next(parser, &type, &value); I tried a non negative number and got success which is suspicious Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at test-json-parser.c:164 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); $122 = 25 $123 = JSON_TYPE_NUMBER $124 = JSON_TYPE_NUMBER $125 = 0x555555633b2c "12.456" $126 = 0x555555693110 "12.456" $127 = 0 (gdb) Also the return value otherwise LGTM, it is recognized as a number: 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) { (gdb) n 541 *type_r = JSON_TYPE_NUMBER; So parsing worked one might think, but the value was not set. (gdb) p *value_r $833 = 0x555555693110 "" It turns out json_parse_number is too optimized. Too optimized for debugging further into json_parse_number. But also optimization is causing the issue. With -O0 or -O1 it works fine, even putting the lower optimization level just around json_parse_number it is enough to avoid the issue. Not a fix but "avoidance" for now: --- a/src/lib/json-parser.c +++ b/src/lib/json-parser.c @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa return 1; } +#pragma GCC push_options +#pragma GCC optimize ("-O0") static int json_parse_number(struct json_parser *parser, const char **value_r) { int ret; @@ -415,6 +417,7 @@ static int json_parse_number(struct json *value_r = str_c(parser->value); return 1; } +#pragma GCC pop_options static int json_parse_atom(struct json_parser *parser, const char *atom) { I was trying to see what happened and tracked each str_append_c that happened inside json_parse_number. A breakpoint on str_append_c and backtraces showed that we clearly do the "right" parsing. From the backtraces I've seen it does (just from positions): For example (gdb) bt #0 str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 #1 json_parse_digits (parser=<optimized out>) at json-parser.c:366 => that is a digit It does "- digit digit dot digit digit digit" This perfectly matches the input value, but still the result stays empty: (gdb) p (char*)(str->data) $738 = 0x555555693110 "" In comparison - when parsing the latter 12.456e9 which works I see two odd things: 1. the value isn't cleared despite str_truncate(parser->value, 0); at the beginning of json_parse_number 2. the value (this time) is read exactly as I assumed Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0, parser=0x555555692ee0) at json-parser.c:394 394 str_truncate(parser->value, 0); Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 44 buffer_append_c(str, chr); (gdb) p (char*)(str->data) $741 = 0x555555693110 "sub3" ^^ not cleared, still has "sub3" even later on Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 44 buffer_append_c(str, chr); (gdb) p (char*)(str->data) $742 = 0x555555693110 "1ub3" ^^ reading left to right one char at a time "over the old value", which didn't update the data. So it seems in json_parse_number the ordering of a) str_truncate(parser->value, 0); b) the parsing itself is messed up by optimization. We already know from the above that reducing just this function to -O1 or less avoids the issue. But what is it really - I don't know? Maybe you have a better idea what is going on, the issue should be reproducible when building dovecot with the toolchain present on the latest Ubuntu being 21.04 (Hirsute) right now. -- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd
> On 30/03/2021 18:53 Christian Ehrhardt <christian.ehrhardt at canonical.com> wrote: > > > Hi, > the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1 > build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz > A coworker tried 2.3.14 but got the same result. > > What fails is the json_parser build time test like: > test-json-parser.c:161: Assert(#25) failed: > null_strcmp(json_output[pos].value, value) == 0 > > I was looking into that a bit more and what I found is that it is > dependent on the new toolchain > of gcc 10.2.0-1. > > Not all calls to json_parse_* fail, e.g. the first one looks all good and passes > I was iterating the tests using a report function defined like > > (gdb) define repcon > >c > >p pos > >p json_output[pos].type > >p type > >p json_output[pos].value > >p value > >call null_strcmp(json_output[pos].value, value) > >end > > The first one to be bad was: > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > test-json-parser.c:161 > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > $84 = 25 > $85 = JSON_TYPE_NUMBER > $86 = JSON_TYPE_NUMBER > $87 = 0x555555633b25 "-12.456" > $88 = 0x555555693110 "" > $89 = 45 > > Earlier and later parsing was happy, for example > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > test-json-parser.c:161 > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > $90 = 27 > $91 = JSON_TYPE_NUMBER > $92 = JSON_TYPE_NUMBER > $93 = 0x555555633b32 "12.456e9" > $94 = 0x555555693110 "12.456e9" > $95 = 0 > (gdb) > > > We have two things we compare here. > 1. json_output[] which is a static define and for this value is > 67 ????{ JSON_TYPE_NUMBER, "-12.456" }, > 2. the return value that json_parse_next returns. > 25 ????" \"sub2\":-12.456,\n" > 148 ????????????????ret = json_parse_next(parser, &type, &value); > > I tried a non negative number and got success which is suspicious > > Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at > test-json-parser.c:164 > 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > $122 = 25 > $123 = JSON_TYPE_NUMBER > $124 = JSON_TYPE_NUMBER > $125 = 0x555555633b2c "12.456" > $126 = 0x555555693110 "12.456" > $127 = 0 > (gdb) > > > Also the return value otherwise LGTM, it is recognized as a number: > 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) { > (gdb) n > 541 *type_r = JSON_TYPE_NUMBER; > > So parsing worked one might think, but the value was not set. > > (gdb) p *value_r > $833 = 0x555555693110 "" > > It turns out json_parse_number is too optimized. > Too optimized for debugging further into json_parse_number. > But also optimization is causing the issue. > With -O0 or -O1 it works fine, even putting the lower optimization > level just around json_parse_number it is enough to avoid the issue. > > > Not a fix but "avoidance" for now: > > --- a/src/lib/json-parser.c > +++ b/src/lib/json-parser.c > @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa > return 1; > } > > +#pragma GCC push_options > +#pragma GCC optimize ("-O0") > static int json_parse_number(struct json_parser *parser, const char **value_r) > { > int ret; > @@ -415,6 +417,7 @@ static int json_parse_number(struct json > *value_r = str_c(parser->value); > return 1; > } > +#pragma GCC pop_options > > static int json_parse_atom(struct json_parser *parser, const char *atom) > { > > > I was trying to see what happened and tracked each str_append_c > that happened inside json_parse_number. > A breakpoint on str_append_c and backtraces showed that we clearly do the > "right" parsing. From the backtraces I've seen it does (just from positions): > > For example > (gdb) bt > #0 str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 > #1 json_parse_digits (parser=<optimized out>) at json-parser.c:366 > => that is a digit > > It does "- digit digit dot digit digit digit" > > This perfectly matches the input value, but still the result stays empty: > (gdb) p (char*)(str->data) > $738 = 0x555555693110 "" > > In comparison - when parsing the latter 12.456e9 which works I see two > odd things: > 1. the value isn't cleared despite > str_truncate(parser->value, 0); > at the beginning of json_parse_number > 2. the value (this time) is read exactly as I assumed > > Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0, > parser=0x555555692ee0) at json-parser.c:394 > 394 str_truncate(parser->value, 0); > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > at str.h:44 > 44 buffer_append_c(str, chr); > (gdb) p (char*)(str->data) > $741 = 0x555555693110 "sub3" > > ^^ not cleared, still has "sub3" even later on > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > at str.h:44 > 44 buffer_append_c(str, chr); > (gdb) p (char*)(str->data) > $742 = 0x555555693110 "1ub3" > > ^^ reading left to right one char at a time "over the old value", > which didn't update the data. > > So it seems in json_parse_number the ordering of > a) str_truncate(parser->value, 0); > b) the parsing itself > is messed up by optimization. > > We already know from the above that reducing just this function to -O1 or less > avoids the issue. But what is it really - I don't know? > > Maybe you have a better idea what is going on, the issue should be > reproducible when building > dovecot with the toolchain present on the latest Ubuntu being 21.04 > (Hirsute) right now. > > -- > Christian Ehrhardt > Staff Engineer, Ubuntu Server > Canonical LtdThanks! We'll look into this. Aki
Josef 'Jeff' Sipek
2021-Mar-30 17:34 UTC
json_parse_number broken by compiler optimization
On Tue, Mar 30, 2021 at 17:53:27 +0200, Christian Ehrhardt wrote:> Hi, > the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1 > build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz > A coworker tried 2.3.14 but got the same result. > > What fails is the json_parser build time test like: > test-json-parser.c:161: Assert(#25) failed: > null_strcmp(json_output[pos].value, value) == 0 > > I was looking into that a bit more and what I found is that it is > dependent on the new toolchain > of gcc 10.2.0-1.FWIW, I managed to reproduce it on FreeBSD with gcc 11, so the good news for you is that it isn't Ubuntu specific :) I'll debug further. Thanks again for the report, Jeff.> > Not all calls to json_parse_* fail, e.g. the first one looks all good and passes > I was iterating the tests using a report function defined like > > (gdb) define repcon > >c > >p pos > >p json_output[pos].type > >p type > >p json_output[pos].value > >p value > >call null_strcmp(json_output[pos].value, value) > >end > > The first one to be bad was: > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > test-json-parser.c:161 > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > $84 = 25 > $85 = JSON_TYPE_NUMBER > $86 = JSON_TYPE_NUMBER > $87 = 0x555555633b25 "-12.456" > $88 = 0x555555693110 "" > $89 = 45 > > Earlier and later parsing was happy, for example > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at > test-json-parser.c:161 > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > $90 = 27 > $91 = JSON_TYPE_NUMBER > $92 = JSON_TYPE_NUMBER > $93 = 0x555555633b32 "12.456e9" > $94 = 0x555555693110 "12.456e9" > $95 = 0 > (gdb) > > > We have two things we compare here. > 1. json_output[] which is a static define and for this value is > 67 ????{ JSON_TYPE_NUMBER, "-12.456" }, > 2. the return value that json_parse_next returns. > 25 ????" \"sub2\":-12.456,\n" > 148 ????????????????ret = json_parse_next(parser, &type, &value); > > I tried a non negative number and got success which is suspicious > > Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at > test-json-parser.c:164 > 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); > $122 = 25 > $123 = JSON_TYPE_NUMBER > $124 = JSON_TYPE_NUMBER > $125 = 0x555555633b2c "12.456" > $126 = 0x555555693110 "12.456" > $127 = 0 > (gdb) > > > Also the return value otherwise LGTM, it is recognized as a number: > 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) { > (gdb) n > 541 *type_r = JSON_TYPE_NUMBER; > > So parsing worked one might think, but the value was not set. > > (gdb) p *value_r > $833 = 0x555555693110 "" > > It turns out json_parse_number is too optimized. > Too optimized for debugging further into json_parse_number. > But also optimization is causing the issue. > With -O0 or -O1 it works fine, even putting the lower optimization > level just around json_parse_number it is enough to avoid the issue. > > > Not a fix but "avoidance" for now: > > --- a/src/lib/json-parser.c > +++ b/src/lib/json-parser.c > @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa > return 1; > } > > +#pragma GCC push_options > +#pragma GCC optimize ("-O0") > static int json_parse_number(struct json_parser *parser, const char **value_r) > { > int ret; > @@ -415,6 +417,7 @@ static int json_parse_number(struct json > *value_r = str_c(parser->value); > return 1; > } > +#pragma GCC pop_options > > static int json_parse_atom(struct json_parser *parser, const char *atom) > { > > > I was trying to see what happened and tracked each str_append_c > that happened inside json_parse_number. > A breakpoint on str_append_c and backtraces showed that we clearly do the > "right" parsing. From the backtraces I've seen it does (just from positions): > > For example > (gdb) bt > #0 str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 > #1 json_parse_digits (parser=<optimized out>) at json-parser.c:366 > => that is a digit > > It does "- digit digit dot digit digit digit" > > This perfectly matches the input value, but still the result stays empty: > (gdb) p (char*)(str->data) > $738 = 0x555555693110 "" > > In comparison - when parsing the latter 12.456e9 which works I see two > odd things: > 1. the value isn't cleared despite > str_truncate(parser->value, 0); > at the beginning of json_parse_number > 2. the value (this time) is read exactly as I assumed > > Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0, > parser=0x555555692ee0) at json-parser.c:394 > 394 str_truncate(parser->value, 0); > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > at str.h:44 > 44 buffer_append_c(str, chr); > (gdb) p (char*)(str->data) > $741 = 0x555555693110 "sub3" > > ^^ not cleared, still has "sub3" even later on > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) > at str.h:44 > 44 buffer_append_c(str, chr); > (gdb) p (char*)(str->data) > $742 = 0x555555693110 "1ub3" > > ^^ reading left to right one char at a time "over the old value", > which didn't update the data. > > So it seems in json_parse_number the ordering of > a) str_truncate(parser->value, 0); > b) the parsing itself > is messed up by optimization. > > We already know from the above that reducing just this function to -O1 or less > avoids the issue. But what is it really - I don't know? > > Maybe you have a better idea what is going on, the issue should be > reproducible when building > dovecot with the toolchain present on the latest Ubuntu being 21.04 > (Hirsute) right now. > > -- > Christian Ehrhardt > Staff Engineer, Ubuntu Server > Canonical Ltd