Mark Millard
2017-Mar-15 18:51 UTC
arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
[Something strange happened to the automatic CC: fill-in for my original reply. Also I should have mentioned that for my test program if a variant is made that does not fork the swapping works fine.] On 2017-Mar-15, at 9:37 AM, Mark Millard <markmi at dsl-only.net> wrote:> On 2017-Mar-15, at 6:15 AM, Scott Bennett <bennett at sdf.org> wrote: > >> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard >> <markmi at dsl-only.net> wrote: >>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <ticso at cicely7.cicely.de> wrote: >>> >>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: >>>>> [test_check() between the fork and the wait/sleep prevents the >>>>> failure from occurring. Even a small access to the memory at >>>>> that stage prevents the failure. Details follow.] >>>> >>>> Maybe a stupid question, since you might have written it somewhere. >>>> What medium do you swap to? >>>> I've seen broken firmware on microSD cards doing silent data >>>> corruption for some access patterns. >>> >>> The root filesystem is on a USB SSD on a powered hub. >>> >>> Only the kernel is from the microSD card. >>> >>> I have several examples of the USB SSD model and have >>> never observed such problems in any other context. >>> >>> [remainder of irrelevant material deleted --SB] >> >> You gave a very long-winded non-answer to Bernd's question, so I'll >> repeat it here. What medium do you swap to? > > My wording of: > > The root filesystem is on a USB SSD on a powered hub. > > was definitely poor. It should have explicitly mentioned the > swap partition too: > > The root filesystem and swap partition are both on the same > USB SSD on a powered hub. > > More detail from dmesg -a for usb: > > usbus0: 12Mbps Full Speed USB v1.0 > usbus1: 480Mbps High Speed USB v2.0 > usbus2: 12Mbps Full Speed USB v1.0 > usbus3: 480Mbps High Speed USB v2.0 > ugen0.1: <Generic OHCI root HUB> at usbus0 > uhub0: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 > ugen1.1: <Allwinner EHCI root HUB> at usbus1 > uhub1: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 > ugen2.1: <Generic OHCI root HUB> at usbus2 > uhub2: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 > ugen3.1: <Allwinner EHCI root HUB> at usbus3 > uhub3: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 > . . . > uhub0: 1 port with 1 removable, self powered > uhub2: 1 port with 1 removable, self powered > uhub1: 1 port with 1 removable, self powered > uhub3: 1 port with 1 removable, self powered > ugen3.2: <GenesysLogic USB2.0 Hub> at usbus3 > uhub4 on uhub3 > uhub4: <GenesysLogic USB2.0 Hub, class 9/0, rev 2.00/90.20, addr 2> on usbus3 > uhub4: MTT enabled > uhub4: 4 ports with 4 removable, self powered > ugen3.3: <OWC Envoy Pro mini> at usbus3 > umass0 on uhub4 > umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 3> on usbus3 > umass0: SCSI over Bulk-Only; quirks = 0x0100 > umass0:0:0: Attached to scbus0 > . . . > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device > da0: Serial Number <REPLACED> > da0: 40.000MB/s transfers > > (Edited a bit because there is other material interlaced, even > internal to some lines. Also: I removed the serial number of the > specific example device.) > >> I will further note that any kind of USB device cannot automatically >> be trusted to behave properly. USB devices are notorious, for example, >> for momentarily dropping off-line and then immediately reconnecting. (ZFS >> reacts very poorly to such events, BTW.) This misbehavior can be caused >> by either processor involved, i.e., the one controlling either the >> upstream or the downstream device. Hubs are really bad about this, but >> any USB device can be guilty. You may have a defective storage device, >> its controller may be defective, or any controller in the chain all the >> way back to the motherboard may be defective or, not defective, but >> corrupted by having been connected to another device with corrupted >> (infected) firmware that tries to flash itself into the firmware flash >> chips in its potential victim. >> Flash memory chips, spinning disks, or {S,}{D,}RAM chips can be >> defective. Without parity bits, the devices may return bad data and lie >> about the presence of corrupted data. That, for example, is where ZFS >> is better than any kind of classical RAID because ZFS keeps checksums on >> everything, so it has a reasonable chance of detecting corruption even >> without parity support and, if there is any redundancy in the pool or the >> data set, fixing the bad data machine. Even having parity generally >> allows only the detection of single-bit errors, but not of repairing them. >> You should identify where you page/swap to and then try substituting >> a different device for that function as a test to eliminate the possibility >> of a bad storage device/controller. If the problem still occurs, that >> means there still remains the possibility that another controller or its >> firmware is defective instead. It could be a kernel bug, it is true, but >> making sure there is no hardware or firmware error occurring is important, >> and as I say, USB devices should always be considered suspect unless and >> until proven innocent. > > [FYI: This is a ufs context, not a zfs one.] > > I'm aware of such things. There is no evidence that has resulted in > suggesting the USB devices that I can replace are a problem. Otherwise > I'd not be going down this path. I only have access to the one arm64 > device (a Pine64+ 2GB) so I've no ability to substitution-test what > is on that board. > > It would be neat if some folks used my code to test other arm64 > contexts and reported the results. I'd be very interested. > (This is easier to do on devices that do not have massive > amounts of RAM, which may limit the range of devices or > device configurations that are reasonable to test.) > > There is that other people using other devices have reported > the behavior that started this investigation. I can produce the > behavior that they reported, although I've not seen anyone else > listing specific steps that lead to the problem or ways to tell > if the symptom is going to happen before it actually does. Nor > have I seen any other core dump analysis. (I have bugzilla > submittals 217138 and 217239 tied to symptoms others have > reported as well as this test program material.) > > Also, considering that for my test program I can control which pages > get the zeroed-problem by read-accessing even one byte of any 4K > Byte page that I want to make work normally, doing so in the child > process of the fork, between the fork and the sleep/swap-out, it does > not suggest USB-device-specific behavior. The read-access is changing > the status of the page in some way as far as I can tell. > > (Such read-accesses in the parent process make no difference to the > behavior.)I should have noted another comparison/contrast between having memory corruption and not in my context: I've tried variants of my test program that do not fork but just sleep for 60s to allow me to force the swap-out. I did this before adding fork and before using parital_test_check, for example. I gradually added things apparently involved in the reports others had made until I found a combination that produced a memory corruption test failure. These tests without fork involved find no problems with the memory content after the swap-in. For my test program it appears that fork-before-swap-out or the like is essential to having the problem occur. ==Mark Millard markmi at dsl-only.net
Scott Bennett
2017-Mar-16 06:07 UTC
arm64 fork/swap data corruptions: A ~110 line C program demonstrating an example (Pine64+ 2GB context) [Corrected subject: arm64!]
Mark Millard <markmi ta dsl-only.net> wrote:> [Something strange happened to the automatic CC: fill-in for my original > reply. Also I should have mentioned that for my test program if a > variant is made that does not fork the swapping works fine.] > > On 2017-Mar-15, at 9:37 AM, Mark Millard <markmi at dsl-only.net> wrote: > > > On 2017-Mar-15, at 6:15 AM, Scott Bennett <bennett at sdf.org> wrote: > > > >> On Tue, 14 Mar 2017 18:18:56 -0700 Mark Millard > >> <markmi at dsl-only.net> wrote: > >>> On 2017-Mar-14, at 4:44 PM, Bernd Walter <ticso at cicely7.cicely.de> wrote: > >>> > >>>> On Tue, Mar 14, 2017 at 03:28:53PM -0700, Mark Millard wrote: > >>>>> [test_check() between the fork and the wait/sleep prevents the > >>>>> failure from occurring. Even a small access to the memory at > >>>>> that stage prevents the failure. Details follow.] > >>>> > >>>> Maybe a stupid question, since you might have written it somewhere. > >>>> What medium do you swap to? > >>>> I've seen broken firmware on microSD cards doing silent data > >>>> corruption for some access patterns. > >>> > >>> The root filesystem is on a USB SSD on a powered hub. > >>> > >>> Only the kernel is from the microSD card. > >>> > >>> I have several examples of the USB SSD model and have > >>> never observed such problems in any other context. > >>> > >>> [remainder of irrelevant material deleted --SB] > >> > >> You gave a very long-winded non-answer to Bernd's question, so I'll > >> repeat it here. What medium do you swap to? > > > > My wording of: > > > > The root filesystem is on a USB SSD on a powered hub. > > > > was definitely poor. It should have explicitly mentioned the > > swap partition too: > > > > The root filesystem and swap partition are both on the same > > USB SSD on a powered hub. > > > > More detail from dmesg -a for usb: > > > > usbus0: 12Mbps Full Speed USB v1.0 > > usbus1: 480Mbps High Speed USB v2.0 > > usbus2: 12Mbps Full Speed USB v1.0 > > usbus3: 480Mbps High Speed USB v2.0 > > ugen0.1: <Generic OHCI root HUB> at usbus0 > > uhub0: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0 > > ugen1.1: <Allwinner EHCI root HUB> at usbus1 > > uhub1: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1 > > ugen2.1: <Generic OHCI root HUB> at usbus2 > > uhub2: <Generic OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus2 > > ugen3.1: <Allwinner EHCI root HUB> at usbus3 > > uhub3: <Allwinner EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus3 > > . . . > > uhub0: 1 port with 1 removable, self powered > > uhub2: 1 port with 1 removable, self powered > > uhub1: 1 port with 1 removable, self powered > > uhub3: 1 port with 1 removable, self powered > > ugen3.2: <GenesysLogic USB2.0 Hub> at usbus3 > > uhub4 on uhub3 > > uhub4: <GenesysLogic USB2.0 Hub, class 9/0, rev 2.00/90.20, addr 2> on usbus3 > > uhub4: MTT enabled > > uhub4: 4 ports with 4 removable, self powered > > ugen3.3: <OWC Envoy Pro mini> at usbus3 > > umass0 on uhub4 > > umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 3> on usbus3 > > umass0: SCSI over Bulk-Only; quirks = 0x0100 > > umass0:0:0: Attached to scbus0 > > . . . > > da0 at umass-sim0 bus 0 scbus0 target 0 lun 0 > > da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device > > da0: Serial Number <REPLACED> > > da0: 40.000MB/s transfers > > > > (Edited a bit because there is other material interlaced, even > > internal to some lines. Also: I removed the serial number of the > > specific example device.)Thank you. That presents a much clearer picture.> > > >> I will further note that any kind of USB device cannot automatically > >> be trusted to behave properly. USB devices are notorious, for example, > >> > >> [reasons why deleted --SB] > >> > >> You should identify where you page/swap to and then try substituting > >> a different device for that function as a test to eliminate the possibility > >> of a bad storage device/controller. If the problem still occurs, that > >> means there still remains the possibility that another controller or its > >> firmware is defective instead. It could be a kernel bug, it is true, but > >> making sure there is no hardware or firmware error occurring is important, > >> and as I say, USB devices should always be considered suspect unless and > >> until proven innocent. > > > > [FYI: This is a ufs context, not a zfs one.]Right. It's only a Pi, after all. :-)> > > > I'm aware of such things. There is no evidence that has resulted in > > suggesting the USB devices that I can replace are a problem. Otherwise > > I'd not be going down this path. I only have access to the one arm64 > > device (a Pine64+ 2GB) so I've no ability to substitution-test what > > is on that board.There isn't even one open port on that hub that you could plug a flash drive into temporarily to be the paging device? You could then try your tests before returning to the normal configuration. If there isn't an open port, then how about plugging a second hub into one of the first hub's ports and moving the displaced device to the second hub? A flash drive could then be plugged in. That kind of configuration is obviously a bad idea for the long run, but just to try your tests it ought to work well enough. (BTW, if a USB storage device containing a paging area drops off=line even momentarily and the system needs to use it, that is the beginning of the end, even though it may take up to a few minutes for everything to lock up. You probably won't be able to do an orderly shutdown, but will instead have to crash it with the power switch. In the case of something like a Pi, this is an unpleasant fact of life, to be sure.) I think I buy your arguments, given the evidence you've collected thus far, including what you've added below. I just like to eliminate possibilities that are much simpler to deal with before facing nastinesses like bugs in the VM subsystem. :-)> > > > It would be neat if some folks used my code to test other arm64 > > contexts and reported the results. I'd be very interested. > > (This is easier to do on devices that do not have massive > > amounts of RAM, which may limit the range of devices or > > device configurations that are reasonable to test.) > > > > There is that other people using other devices have reported > > the behavior that started this investigation. I can produce the > > behavior that they reported, although I've not seen anyone else > > listing specific steps that lead to the problem or ways to tell > > if the symptom is going to happen before it actually does. Nor > > have I seen any other core dump analysis. (I have bugzilla > > submittals 217138 and 217239 tied to symptoms others have > > reported as well as this test program material.) > > > > Also, considering that for my test program I can control which pages > > get the zeroed-problem by read-accessing even one byte of any 4K > > Byte page that I want to make work normally, doing so in the child > > process of the fork, between the fork and the sleep/swap-out, it does > > not suggest USB-device-specific behavior. The read-access is changing > > the status of the page in some way as far as I can tell. > > > > (Such read-accesses in the parent process make no difference to the > > behavior.) > > I should have noted another comparison/contrast between > having memory corruption and not in my context: > > I've tried variants of my test program that do not fork but > just sleep for 60s to allow me to force the swap-out. I > did this before adding fork and before using > parital_test_check, for example. I gradually added things > apparently involved in the reports others had made > until I found a combination that produced a memory > corruption test failure. > > These tests without fork involved find no problems with > the memory content after the swap-in. > > For my test program it appears that fork-before-swap-out > or the like is essential to having the problem occur. >A comment about terminology seems in order here. It bothers me considerably to see you writing "swap out" or "swapping" where it seems like you mean to write "page out" or "paging". A BSD system whose swapping mechanism gets activated has already waded very deeply into the quicksand and frequently cannot be gotten out in a reasonable amount of time even with manual assistance. It is often quicker to crash it, reboot, and wait for the fsck(8) cleanups to complete. Orderly shutdowns, even of the kind that results from a quick poke to the power button, typically get mired in the same mess that already has the system in knots. Also, BSD systems since 3.0BSD, unlike older AT&T (pre-SysVR2.3) systems, do not swap in, just out. A swapped out process, once the system determines that it has adequate resources again to attempt to run the process, will have the interrupted text page paged in and the rest will be paged in by the normal mechanism of page faults and page-in operations. I assume you must already know all this, which is a large part of why it grates on me that you appear to be using the wrong terms. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * **********************************************************************