[Just adding readelf -S info since it seems to show more.]
On 2021-May-4, at 10:01, Mark Millard <marklmi at yahoo.com> wrote:
> On 2021-May-4, at 08:51, Mark Millard <marklmi at yahoo.com> wrote:
>
>> On 2021-May-4, at 06:01, Ed Maste <emaste at freebsd.org> wrote:
>>
>>> On Mon, 3 May 2021 at 22:26, Mark Millard <marklmi at
yahoo.com> wrote:
>>>>
>>>> But I'll note that I've built and stalled
py37-diffoscope
>>>> (new to me). A basic quick test showed that it reports:
>>>>
>>>> W: diffoscope.main: Fuzzy-matching is currently disabled as the
"tlsh" module is unavailable.
>>>
>>> I just looked up tlsh - its "A Locality Sensitive Hash";
I presume
>>> diffoscope uses it to infer file renames. I believe the warning
>>> emitted here should have no impact on the output we're looking
for.
>>
>> Okay.
>>
>>> As far as the utf-8 issues go, diffoscope requires a utf-8 locale
and
>>> I suspect that is the issue. If you don't have LANG set
already, try
>>> setting LANG=C.UTF-8 in your environment.
>>
>> That is not the issue for the UnicodeDecodeError:
>>
>> # echo $LANG
>> C.UTF-8
>>
>> # diffoscope /.zfs/snapshot/2021-04-*-01:40:48-0/bin/sh
>> $<3/>2021-05-04 08:49:21 W: diffoscope.main: Fuzzy-matching is
currently disabled as the "tlsh" module is unavailable.
>> $<3/>Traceback (most recent call last):
>> File
"/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 745,
in main
>> sys.exit(run_diffoscope(parsed_args))
>> File
"/usr/local/lib/python3.7/site-packages/diffoscope/main.py", line 677,
in run_diffoscope
>> difference = load_diff_from_path(path1)
>> File
"/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py",
line 31, in load_diff_from_path
>> return load_diff(codecs.getreader("utf-8")(fp), path)
>> File
"/usr/local/lib/python3.7/site-packages/diffoscope/readers/__init__.py",
line 35, in load_diff
>> return JSONReaderV1().load(fp, path)
>> File
"/usr/local/lib/python3.7/site-packages/diffoscope/readers/json.py",
line 33, in load
>> raw = json.load(fp)
>> File "/usr/local/lib/python3.7/json/__init__.py", line 293,
in load
>> return loads(fp.read(),
>> File "/usr/local/lib/python3.7/codecs.py", line 504, in read
>> newchars, decodedbytes = self.decode(data, self.errors)
>> UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in
position 18: invalid start byte
>>
>
> Well, the list of differing files is huge. But this seems to
> be .gnu_debuglink content for the area it is in.
Specifically: the last 4 bytes of the .gnu_debuglink section.
> I'll note
> that I did installworld but not the likes of distrib-dirs
> or distribution this time.
>
> This test did buildworld to two distinct directories:
>
> zroot/BUILDs/13_0R-CA72-nodbg-clang 5.13G 118G 5.13G
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang
> zroot/BUILDs/13_0R-CA72-nodbg-clang-alt 4.28G 118G 4.28G
/usr/obj/BUILDs/13_0R-CA72-nodbg-clang-alt
>
> and installworld to 2 distinct directories:
>
> zroot/DESTDIRs/13_0R-CA72-instwrld-alt 1.44G 118G 1.44G
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt
> zroot/DESTDIRs/13_0R-CA72-instwrld-norm 1.44G 118G 1.44G
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm
>
> Previously (armv7 target) I had built, installed, rebuilt
> to same directory (after clean-out) and installed to an
> alternate directory. That had gotten only a few files
> different but I do not know (yet) if it was the procedural
> difference that made the difference.
>
> Prefix of the list of different files this time:
>
> # diff -rq /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/ | more
> Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/[ and
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/[ differ
> Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat and
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat differ
> Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags and
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags differ
> Files /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio and
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio differ
> . . .
>
> Looking, aarch64 seems to typically get a back-to-back
> sequence of 4 bytes different in native programs in my
> builds:
>
> # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat
> 00003bd4 1d 65
> 00003bd5 eb a3
> 00003bd6 bb ca
> 00003bd7 8e 1a
>
> # ls -Tld /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat
> -r-xr-xr-x 1 root wheel 18448 May 4 08:55:01 2021
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/cat
> -r-xr-xr-x 1 root wheel 18448 May 3 23:16:36 2021
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/cat
>
> Sections:
> Idx Name Size VMA LMA File off
Algn
> . . .
> 25 .gnu_debuglink 00000010 0000000000000000 0000000000000000 00003bc8
2**0
> CONTENTS, READONLY
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
. . .
[25] .comment PROGBITS 0000000000000000 00002c70
00000000000000b2 0000000000000001 MS 0 0 1
[26] .gnu_debuglink PROGBITS 0000000000000000 00003bc8
0000000000000010 0000000000000000 0 0 1
[27] .shstrtab STRTAB 0000000000000000 00003bd8
0000000000000100 0000000000000000 0 0 1
[28] .symtab SYMTAB 0000000000000000 00002d28
0000000000000ea0 0000000000000018 29 96 8
[29] .strtab STRTAB 0000000000000000 00003cd8
00000000000003b3 0000000000000000 0 0 1
> 00003bd4-00003bc8 == 0xC
Note: 0xC+0x4 == 0x10 (the size), so the last 4 bytes
of .gnu_debuglink
> # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags
> 00002208 88 a1
> 00002209 e6 40
> 0000220a 60 94
> 0000220b bf ce
>
> # ls -Tld /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags
> -r-xr-xr-x 1 root wheel 11440 May 4 08:55:01 2021
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chflags
> -r-xr-xr-x 1 root wheel 11440 May 3 23:16:36 2021
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chflags
>
> Sections:
> Idx Name Size VMA LMA File off
Algn
> . . .
> 25 .gnu_debuglink 00000014 0000000000000000 0000000000000000 000021f8
2**0
> CONTENTS, READONLY
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
. . .
[25] .comment PROGBITS 0000000000000000 000016d8
00000000000000b2 0000000000000001 MS 0 0 1
[26] .gnu_debuglink PROGBITS 0000000000000000 000021f8
0000000000000014 0000000000000000 0 0 1
[27] .shstrtab STRTAB 0000000000000000 0000220c
0000000000000100 0000000000000000 0 0 1
[28] .symtab SYMTAB 0000000000000000 00001790
0000000000000a68 0000000000000018 29 83 8
[29] .strtab STRTAB 0000000000000000 0000230c
000000000000021f 0000000000000000 0 0 1
> 00002208-000021f8 == 0x10
Note: 0x10+0x4 == 0x14 (the size), so the last 4 bytes
of .gnu_debuglink
> # cmp -x /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio
> 000050c4 6b 3e
> 000050c5 08 ca
> 000050c6 7a 2f
> 000050c7 5d 64
>
> # ls -Tld /usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio
> -r-xr-xr-x 1 root wheel 23728 May 4 08:55:01 2021
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-alt/bin/chio
> -r-xr-xr-x 1 root wheel 23728 May 3 23:16:37 2021
/usr/obj/DESTDIRs/13_0R-CA72-instwrld-norm/bin/chio
>
> Sections:
> Idx Name Size VMA LMA File off
Algn
> . . .
> 25 .gnu_debuglink 00000010 0000000000000000 0000000000000000 000050b8
2**0
> CONTENTS, READONLY
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
. . .
[25] .comment PROGBITS 0000000000000000 00004298
00000000000000b2 0000000000000001 MS 0 0 1
[26] .gnu_debuglink PROGBITS 0000000000000000 000050b8
0000000000000010 0000000000000000 0 0 1
[27] .shstrtab STRTAB 0000000000000000 000050c8
0000000000000100 0000000000000000 0 0 1
[28] .symtab SYMTAB 0000000000000000 00004350
0000000000000d68 0000000000000018 29 100 8
[29] .strtab STRTAB 0000000000000000 000051c8
0000000000000363 0000000000000000 0 0 1
> 000050c4-000050b8 == 0xC
Note: 0xC+0x4 == 0x10 (the size), so the last 4 bytes
of .gnu_debuglink
> For all I know, some individual byte(s) in the 4 might accidentally
> match sometimes. The addition offset after .gnu_debuglink's file
> offset does vary (0xC and 0x10 above).
Specifically: the last 4 bytes of the .gnu_debuglink section.
> The content of those differences do not look like
> file path components, for example the 0x08 byte.
>
> I do build with:
>
> # Avoid stripping but do not control host -g status as well:
> DEBUG_FLAGS+> #
> WITH_REPRODUCIBLE_BUILD> WITH_DEBUG_FILES>
> But that was true for the earlier armv7 target example
> that I reported that only got a few files with
> differences.
==Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)