[Just adding readelf -S info since it seems to show more.] On 2021-May-4, at 10:01, Mark Millard <marklmi at yahoo.com> wrote:> On 2021-May-4, at 08:51, Mark Millard <marklmi at yahoo.com> wrote: > >> On 2021-May-4, at 06:01, Ed Maste <emaste at freebsd.org> wrote: >> >>> On Mon, 3 May 2021 at 22:26, Mark Millard <marklmi at yahoo.com> wrote: >>>> >>>> But I'll note that I've built and stalled py37-diffoscope >>>> (new to me). A basic quick test showed that it reports: >>>> >>>> W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module is unavailable. >>> >>> I just looked up tlsh - its "A Locality Sensitive Hash"; I presume >>> diffoscope uses it to infer file renames. I believe the warning >>> emitted here should have no impact on the output we're looking for. >> >> Okay. >> >>> As far as the utf-8 issues go, diffoscope requires a utf-8 locale and >>> I suspect that is the issue. If you don't have LANG set already, try >>> setting LANG=C.UTF-8 in your environment. >> >> That is not the issue for the UnicodeDecodeError: >> >> # echo $LANG >> C.UTF-8 >> >> # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh >> $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is currently disabled as the "tlsh" module is unavailable. >> $<3/>Traceback (most recent call last): >> File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745, in main >> sys.exit(run_diffoscope(parsed_args)) >> File "/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677, in run_diffoscope >> difference = load_diff_from_path(path1) >> File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 31, in load_diff_from_path >> return load_diff(codecs.getreader("utf-8")(fp), path) >> File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py", line 35, in load_diff >> return JSONReaderV1().load(fp, path) >> File "/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py", line 33, in load >> raw = json.load(fp) >> File "/usr/local/lib/python3.7/json/__init__.py", line 293, in load >> return loads(fp.read(), >> File "/usr/local/lib/python3.7/codecs.py", line 504, in read >> newchars, decodedbytes = self.decode(data, self.errors) >> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 18: invalid start byte >> > > Well, the list of differing files is huge. But this seems to > be .gnu_debuglink content for the area it is in.Specifically: the last 4 bytes of the .gnu_debuglink section.> I'll note > that I did installworld but not the likes of distrib-dirs > or distribution this time. > > This test did buildworld to two distinct directories: > > zroot/BUILDs/13_0R-CA72-nodbg-clang 5.13G 118G 5.13G /usr/obj/BUILDs/13_0R-CA72-nodbg-clang > zroot/BUILDs/13_0R-CA72-nodbg-clang-alt 4.28G 118G 4.28G /usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt > > and installworld to 2 distinct directories: > > zroot/DESTDIRs/13_0R-CA72-instwrld-alt 1.44G 118G 1.44G /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt > zroot/DESTDIRs/13_0R-CA72-instwrld-norm 1.44G 118G 1.44G /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm > > Previously (armv7 target) I had built, installed, rebuilt > to same directory (after clean-out) and installed to an > alternate directory. That had gotten only a few files > different but I do not know (yet) if it was the procedural > difference that made the difference. > > Prefix of the list of different files this time: > > # diff -rq /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/ /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/ | more > Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/[ and /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/[ differ > Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat and /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat differ > Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags and /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags differ > Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio and /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio differ > . . . > > Looking, aarch64 seems to typically get a back-to-back > sequence of 4 bytes different in native programs in my > builds: > > # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat > 00003bd4 1d 65 > 00003bd5 eb a3 > 00003bd6 bb ca > 00003bd7 8e 1a > > # ls -Tld /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat > -r-xr-xr-x 1 root wheel 18448 May 4 08:55:01 2021 /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat > -r-xr-xr-x 1 root wheel 18448 May 3 23:16:36 2021 /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat > > Sections: > Idx Name Size VMA LMA File off Algn > . . . > 25 .gnu_debuglink 00000010 0000000000000000 0000000000000000 00003bc8 2**0 > CONTENTS, READONLYSection Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align . . . [25] .comment PROGBITS 0000000000000000 00002c70 00000000000000b2 0000000000000001 MS 0 0 1 [26] .gnu_debuglink PROGBITS 0000000000000000 00003bc8 0000000000000010 0000000000000000 0 0 1 [27] .shstrtab STRTAB 0000000000000000 00003bd8 0000000000000100 0000000000000000 0 0 1 [28] .symtab SYMTAB 0000000000000000 00002d28 0000000000000ea0 0000000000000018 29 96 8 [29] .strtab STRTAB 0000000000000000 00003cd8 00000000000003b3 0000000000000000 0 0 1> 00003bd4-00003bc8 == 0xCNote: 0xC+0x4 == 0x10 (the size), so the last 4 bytes of .gnu_debuglink> # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags > 00002208 88 a1 > 00002209 e6 40 > 0000220a 60 94 > 0000220b bf ce > > # ls -Tld /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags > -r-xr-xr-x 1 root wheel 11440 May 4 08:55:01 2021 /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags > -r-xr-xr-x 1 root wheel 11440 May 3 23:16:36 2021 /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags > > Sections: > Idx Name Size VMA LMA File off Algn > . . . > 25 .gnu_debuglink 00000014 0000000000000000 0000000000000000 000021f8 2**0 > CONTENTS, READONLYSection Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align . . . [25] .comment PROGBITS 0000000000000000 000016d8 00000000000000b2 0000000000000001 MS 0 0 1 [26] .gnu_debuglink PROGBITS 0000000000000000 000021f8 0000000000000014 0000000000000000 0 0 1 [27] .shstrtab STRTAB 0000000000000000 0000220c 0000000000000100 0000000000000000 0 0 1 [28] .symtab SYMTAB 0000000000000000 00001790 0000000000000a68 0000000000000018 29 83 8 [29] .strtab STRTAB 0000000000000000 0000230c 000000000000021f 0000000000000000 0 0 1> 00002208-000021f8 == 0x10Note: 0x10+0x4 == 0x14 (the size), so the last 4 bytes of .gnu_debuglink> # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio > 000050c4 6b 3e > 000050c5 08 ca > 000050c6 7a 2f > 000050c7 5d 64 > > # ls -Tld /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio > -r-xr-xr-x 1 root wheel 23728 May 4 08:55:01 2021 /usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio > -r-xr-xr-x 1 root wheel 23728 May 3 23:16:37 2021 /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio > > Sections: > Idx Name Size VMA LMA File off Algn > . . . > 25 .gnu_debuglink 00000010 0000000000000000 0000000000000000 000050b8 2**0 > CONTENTS, READONLYSection Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align . . . [25] .comment PROGBITS 0000000000000000 00004298 00000000000000b2 0000000000000001 MS 0 0 1 [26] .gnu_debuglink PROGBITS 0000000000000000 000050b8 0000000000000010 0000000000000000 0 0 1 [27] .shstrtab STRTAB 0000000000000000 000050c8 0000000000000100 0000000000000000 0 0 1 [28] .symtab SYMTAB 0000000000000000 00004350 0000000000000d68 0000000000000018 29 100 8 [29] .strtab STRTAB 0000000000000000 000051c8 0000000000000363 0000000000000000 0 0 1> 000050c4-000050b8 == 0xCNote: 0xC+0x4 == 0x10 (the size), so the last 4 bytes of .gnu_debuglink> For all I know, some individual byte(s) in the 4 might accidentally > match sometimes. The addition offset after .gnu_debuglink's file > offset does vary (0xC and 0x10 above).Specifically: the last 4 bytes of the .gnu_debuglink section.> The content of those differences do not look like > file path components, for example the 0x08 byte. > > I do build with: > > # Avoid stripping but do not control host -g status as well: > DEBUG_FLAGS+> # > WITH_REPRODUCIBLE_BUILD> WITH_DEBUG_FILES> > But that was true for the earlier armv7 target example > that I reported that only got a few files with > differences.==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)
Mark Millard
2021-May-04 17:31 UTC
FYI: WITH_REPRODUCIBLE_BUILD= problem for some files? [Ignore recent test: -dirty vs. checked-in usage difference]
I probably know why the huge count of differences this time unlike the original report . . . Previously I built based on a checked-in branch as part of my experimenting. This time it was in a -dirty form (not checked in), again as part of my experimental exploration. WITH_REPRODUCIBLE_BUILD= makes a distinction between these if I remember right: (partially?) disabling itself for -dirty style. To reproduce the original style of test I need to create a branch with my few patches checked in and do the buildworlds from that branch. This will, of course, take a while. Sorry for the noise. ==Mark Millard marklmi at yahoo.com ( dsl-only.net went away in early 2018-Mar)