Mark Millard
2017-Mar-19 00:53 UTC
arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
A new, significant discovery follows. . . While checking out use of procstat -v I ran into the following common property for the 3 programs that I looked at: A) My small test program that fails for a dynamically allocated space. B) sh reporting Failed assertion: "tsd_booted". C) su reporting Failed assertion: "tsd_booted". Here are example addresses from the area of incorrectly zeroed memory (A then B then C): (lldb) print dyn_region (region *volatile) $0 = 0x0000000040616000 (lldb) print &__je_tsd_booted (bool *) $0 = 0x0000000040618520 (lldb) print &__je_tsd_booted (bool *) $0 = 0x0000000040618520 The first is from dynamic allocation ending up in the area. The other two are from libc.so.7 globals/statics ending up in the general area. It looks like something is trashing a specific memory area for some reason, rather independently of what the program specifics are. Other notes: At least for my small program showing failure: Being explicit about the combined conditions for failure for my test program. . . Both tcache enabled and allocations fitting in SMALL_MAXCLASS are required in order to make the program fail. Note: lldb) print __je_tcache_maxclass (size_t) $0 = 32768 which is larger than SMALL_MAXCLASS. I've not observed failures for sizes above SMALL_MAXCLASS but not exceeding __je_tcache_maxclass. Thus tcache use by itself does not seen sufficient for my program to get corruption of its dynamically allocated memory: the small allocation size also matters. Be warned that I can not eliminate the possibility that the trashing changed what region of memory it trashed for larger allocations or when tcache is disabled. ==Mark Millard markmi at dsl-only.net
Mark Millard
2017-Mar-19 04:10 UTC
arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
On 2017-Mar-18, at 5:53 PM, Mark Millard <markmi at dsl-only.net> wrote:> A new, significant discovery follows. . . > > While checking out use of procstat -v I ran > into the following common property for the 3 > programs that I looked at: > > A) My small test program that fails for > a dynamically allocated space. > > B) sh reporting Failed assertion: "tsd_booted". > > C) su reporting Failed assertion: "tsd_booted". > > Here are example addresses from the area of > incorrectly zeroed memory (A then B then C): > > (lldb) print dyn_region > (region *volatile) $0 = 0x0000000040616000 > > (lldb) print &__je_tsd_booted > (bool *) $0 = 0x0000000040618520 > > (lldb) print &__je_tsd_booted > (bool *) $0 = 0x0000000040618520That last above was a copy/paste error. Correction: (lldb) print &__je_tsd_booted (bool *) $0 = 0x000000004061d520> The first is from dynamic allocation ending up > in the area. The other two are from libc.so.7 > globals/statics ending up in the general area. > > It looks like something is trashing a specific > memory area for some reason, rather independently > of what the program specifics are. > > > Other notes: > > At least for my small program showing failure: > > Being explicit about the combined conditions for failure > for my test program. . . > > Both tcache enabled and allocations fitting in SMALL_MAXCLASS > are required in order to make the program fail. > > Note: > > lldb) print __je_tcache_maxclass > (size_t) $0 = 32768 > > which is larger than SMALL_MAXCLASS. I've not observed > failures for sizes above SMALL_MAXCLASS but not exceeding > __je_tcache_maxclass. > > Thus tcache use by itself does not seen sufficient for > my program to get corruption of its dynamically allocated > memory: the small allocation size also matters. > > > Be warned that I can not eliminate the possibility that > the trashing changed what region of memory it trashed > for larger allocations or when tcache is disabled.The pine64+ 2GB eventually got into a state where: /etc/malloc.conf -> tcache:false made no difference and the failure kept occurring with that symbolic link in place. But after a reboot of the pin46+ 2GB /etc/malloc.conf -> tcache:false was again effective for my test program. (It was still present from before the reboot.) I checked the .core files and the allocated address assigned to dyn_region was the same in the tries before and after the reboot. (I had put in an additional raise(SIGABRT) so I'd always have a core file to look at.) Apparently /etc/malloc.conf -> tcache:false was being ignored before the reboot for some reason? ==Mark Millard markmi at dsl-only.net