Mark Millard
2017-Mar-15 04:33 UTC
arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
A single Byte access to a 4K Byte aligned region between the fork and wait/sleep/swap-out prevents that specific 4K Byte region from having the (bad) zeros. Sounds like a page sized unit of behavior to me. Details follow. On 2017-Mar-14, at 3:28 PM, Mark Millard <markmi at dsl-only.net> wrote:> [test_check() between the fork and the wait/sleep prevents the > failure from occurring. Even a small access to the memory at > that stage prevents the failure. Details follow.] > > On 2017-Mar-14, at 11:07 AM, Mark Millard <markmi at dsl-only.net> wrote: > >> [This is just a correction to the subject-line text to say arm64 >> instead of amd64.] >> >> On 2017-Mar-14, at 12:58 AM, Mark Millard <markmi at dsl-only.net> wrote: >> >> [Another correction I'm afraid --about alternative program variations >> this time.] >> >> On 2017-Mar-13, at 11:52 PM, Mark Millard <markmi at dsl-only.net> wrote: >> >>> I'm still at a loss about how to figure out what stages are messed >>> up. (Memory coherency? Some memory not swapped out? Bad data swapped >>> out? Wrong data swapped in?) >>> >>> But at least I've found a much smaller/simpler example to demonstrate >>> some problem with in my Pine64+_ 2GB context. >>> >>> The Pine64+ 2GB is the only amd64 context that I have access to. >> >> Someday I'll learn to type arm64 the first time instead of amd64. >> >>> The following program fails its check for data >>> having its expected byte pattern in dynamically >>> allocated memory after a fork/swap-out/swap-in >>> sequence. >>> >>> I'll note that the program sleeps for 60s after >>> forking to give time to do something else to >>> cause the parent and child processes to swap >>> out (RES=0 as seen in top). >> >> The following about the extra test_check() was >> wrong. >> >>> Note the source code line: >>> >>> // test_check(); // Adding this line prevents failure. >>> >>> It seem that accessing the region contents before forking >>> and swapping avoids the problem. But there is a problem >>> if the region was only written-to before the fork/swap. > > There is a place that if a test_check call is put then the > problem does not happen at any stage: I tried putting a > call between the fork and the later wait/sleep code:I changed the byte sequence patterns to avoid zero values since the bad values are zeros: static value_type value(size_t v) { return (value_type)((v&0xFEu)|0x1u); } // value now avoids the zero value since the failures // are zeros. With that I can then test accurately what bytes have bad values vs. do not. I also changed to: void partial_test_check(void) { if (value(0u)!=gbl_region.array[0]) raise(SIGABRT); if (value(0u)!=(*dyn_region).array[0]) raise(SIGABRT); } since previously [0] had a zero value and so I'd used [1]. On this basis I'm now using the below. See the comments tied to partial_test_check() calls: extern void test_setup(void); // Sets up the memory byte patterns. extern void test_check(void); // Tests the memory byte patterns. extern void partial_test_check(void); // Tests just [0] of each region // (gbl_region and dyn_region). int main(void) { test_setup(); test_check(); // Before fork() [passes] pid_t pid = fork(); int wait_status = 0;; // After fork; before waitsleep/swap-out. if (0==pid) partial_test_check(); // Even the above is sufficient by // itself to prevent failure for // region_size 1u through // 4u*1024u! // But 4u*1024u+1u and above fail // with this access to memory. // The failing test is of // (*dyn_region).array[4096u]. // This test never fails here. if (0<pid) partial_test_check(); // This never prevents // later failures (and // never fails here). if (0<pid) { wait(&wait_status); } if (-1!=wait_status && 0<=pid) { if (0==pid) { sleep(60); // During this manually force this process to // swap out. I use something like: // stress -m 1 --vm-bytes 1800M // in another shell and ^C'ing it after top // shows the swapped status desired. 1800M // just happened to work on the Pine64+ 2GB // that I was using. I watch with top -PCwaopid . } test_check(); // After wait/sleep [fails for small-enough region_sizes] } }> This suggests to me that the small access is forcing one or more things to > be initialized for memory access that fork is not establishing of itself. > It appears that if established correctly then the swap-out/swap-in > sequence would work okay without needing the manual access to the memory. > > > So far via this test I've not seen any evidence of problems with the global > region but only the dynamically allocated region. > > However, the symptoms that started this investigation in a much more > complicated context had an area of global memory from a .so that ended > up being zero. > > I think that things should be fixed for this simpler context first and > that further investigation of the sh/su related should wait to see what > things are like after this test case works.==Mark Millard markmi at dsl-only.net
Mark Millard
2017-Mar-19 00:53 UTC
arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
A new, significant discovery follows. . . While checking out use of procstat -v I ran into the following common property for the 3 programs that I looked at: A) My small test program that fails for a dynamically allocated space. B) sh reporting Failed assertion: "tsd_booted". C) su reporting Failed assertion: "tsd_booted". Here are example addresses from the area of incorrectly zeroed memory (A then B then C): (lldb) print dyn_region (region *volatile) $0 = 0x0000000040616000 (lldb) print &__je_tsd_booted (bool *) $0 = 0x0000000040618520 (lldb) print &__je_tsd_booted (bool *) $0 = 0x0000000040618520 The first is from dynamic allocation ending up in the area. The other two are from libc.so.7 globals/statics ending up in the general area. It looks like something is trashing a specific memory area for some reason, rather independently of what the program specifics are. Other notes: At least for my small program showing failure: Being explicit about the combined conditions for failure for my test program. . . Both tcache enabled and allocations fitting in SMALL_MAXCLASS are required in order to make the program fail. Note: lldb) print __je_tcache_maxclass (size_t) $0 = 32768 which is larger than SMALL_MAXCLASS. I've not observed failures for sizes above SMALL_MAXCLASS but not exceeding __je_tcache_maxclass. Thus tcache use by itself does not seen sufficient for my program to get corruption of its dynamically allocated memory: the small allocation size also matters. Be warned that I can not eliminate the possibility that the trashing changed what region of memory it trashed for larger allocations or when tcache is disabled. ==Mark Millard markmi at dsl-only.net