Karl Denninger
2019-Apr-10 14:38 UTC
Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
On 4/10/2019 08:45, Andriy Gapon wrote:> On 10/04/2019 04:09, Karl Denninger wrote: >> Specifically, I *explicitly* OFFLINE the disk in question, which is a >> controlled operation and *should* result in a cache flush out of the ZFS >> code into the drive before it is OFFLINE'd. >> >> This should result in the "last written" TXG that the remaining online >> members have, and the one in the offline member, being consistent. >> >> Then I "camcontrol standby" the involved drive, which forces a writeback >> cache flush and a spindown; in other words, re-ordered or not, the >> on-platter data *should* be consistent with what the system thinks >> happened before I yank the physical device. > This may not be enough for a specific [RAID] controller and a specific > configuration. It should be enough for a dumb HBA. But, for example, mrsas(9) > can simply ignore the synchronize cache command (meaning neither the on-board > cache is flushed nor the command is propagated to a disk). So, if you use some > advanced controller it would make sense to use its own management tool to > offline a disk before pulling it. > > I do not preclude a possibility of an issue in ZFS. But it's not the only > possibility either.In this specific case the adapter in question is... mps0: <Avago Technologies (LSI) SAS2116> port 0xc000-0xc0ff mem 0xfbb3c000-0xfbb3ffff,0xfbb40000-0xfbb7ffff irq 30 at device 0.0 on pci3 mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects his drives via dumb on-MoBo direct SATA connections. What I don't know (yet) is if the update to firmware 20.00.07.00 in the HBA has fixed it.? The 11.2 and 12.0 revs of FreeBSD through some mechanism changed timing quite materially in the mps driver; prior to 11.2 I ran with a Lenovo SAS expander connected to SATA disks without any problems at all, even across actual disk failures through the years, but in 11.2 and 12.0 doing this resulted in spurious retries out of the CAM layer that allegedly came from timeouts on individual units (which looked very much like a lost command sent to the disk), but only on mirrored volume sets -- yet there were no errors reported by the drive itself, nor did either of my RaidZ2 pools (one spinning rust, one SSD) experience problems of any sort.?? Flashing the HBA forward to 20.00.07.00 with the expander in resulted in the? *driver* (mps) taking disconnects and resets instead of the targets, which in turn caused random drive fault events across all of the pools.? For obvious reasons that got backed out *fast*. Without the expander 19.00.00.00 has been stable over the last few months *except* for this circumstance, where an intentionally OFFLINE'd disk in a mirror that is brought back online after some reasonably long period of time (days to a week) results in a successful resilver but then a small number of checksum errors on that drive -- always on the one that was OFFLINEd, never on the one(s) not taken OFFLINE -- appear and are corrected when a scrub is subsequently performed.? I am now on 20.00.07.00 and so far -- no problems.? But I've yet to do the backup disk swap on 20.00.07.00 (scheduled for late week or Monday) so I do not know if the 20.00.07.00 roll-forward addresses the scrub issue or not.? I have no reason to believe it is involved, but given the previous "iffy" nature of 11.2 and 12.0 on 19.0 with the expander it very well might be due to what appear to be timing changes in the driver architecture. -- Karl Denninger karl at denninger.net <mailto:karl at denninger.net> /The Market Ticker/ /[S/MIME encrypted email preferred]/ -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4897 bytes Desc: S/MIME Cryptographic Signature URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20190410/5a5b859f/attachment.bin>
Zaphod Beeblebrox
2019-Apr-11 18:52 UTC
Concern: ZFS Mirror issues (12.STABLE and firmware 19 .v. 20)
On Wed, Apr 10, 2019 at 10:41 AM Karl Denninger <karl at denninger.net> wrote:> In this specific case the adapter in question is... > > mps0: <Avago Technologies (LSI) SAS2116> port 0xc000-0xc0ff mem > 0xfbb3c000-0xfbb3ffff,0xfbb40000-0xfbb7ffff irq 30 at device 0.0 on pci3 > mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd > mps0: IOCCapabilities: > 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc> > > Which is indeed a "dumb" HBA (in IT mode), and Zeephod says he connects > his drives via dumb on-MoBo direct SATA connections. >Maybe I'm in good company. My current setup has 8 of the disks connected to: mps0: <Avago Technologies (LSI) SAS2308> port 0xb000-0xb0ff mem 0xfe240000-0xfe24ffff,0xfe200000-0xfe23ffff irq 32 at device 0.0 on pci6 mps0: Firmware: 19.00.00.00, Driver: 21.02.00.00-fbsd mps0: IOCCapabilities: 5a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc> ... just with a cable that breaks out each of the 2 connectors into 4 SATA-style connectors, and the other 8 disks (plus boot disks and SSD cache/log) connected to ports on... - ahci0: <ASMedia ASM1062 AHCI SATA controller> port 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem 0xfe900000-0xfe9001ff irq 44 at device 0.0 on pci2 - ahci2: <Marvell 88SE9230 AHCI SATA controller> port 0xa050-0xa057,0xa040-0xa043,0xa030-0xa037,0xa020-0xa023,0xa000-0xa01f mem 0xfe610000-0xfe6107ff irq 40 at device 0.0 on pci7 - ahci3: <AMD SB7x0/SB8x0/SB9x0 AHCI SATA controller> port 0xf040-0xf047,0xf030-0xf033,0xf020-0xf027,0xf010-0xf013,0xf000-0xf00f mem 0xfea07000-0xfea073ff irq 19 at device 17.0 on pci0 ... each drive connected to a single port. I can actually reproduce this at will. Because I have 16 drives, when one fails, I need to find it. I pull the sata cable for a drive, determine if it's the drive in question, if not, reconnect, "ONLINE" it and wait for resilver to stop... usually only a minute or two. ... if I do this 4 to 6 odd times to find a drive (I can tell, in general, that a drive is part of the SAS controller or the SATA controllers... so I'm only looking among 8, ever) ... then I "REPLACE" the problem drive. More often than not, the a scrub will find a few problems. In fact, it appears that the most recent scrub is an example: [1:7:306]dgilbert at vr:~> zpool status pool: vr1 state: ONLINE scan: scrub repaired 32K in 47h16m with 0 errors on Mon Apr 1 23:12:03 2019 config: NAME STATE READ WRITE CKSUM vr1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gpt/v1-d0 ONLINE 0 0 0 gpt/v1-d1 ONLINE 0 0 0 gpt/v1-d2 ONLINE 0 0 0 gpt/v1-d3 ONLINE 0 0 0 gpt/v1-d4 ONLINE 0 0 0 gpt/v1-d5 ONLINE 0 0 0 gpt/v1-d6 ONLINE 0 0 0 gpt/v1-d7 ONLINE 0 0 0 raidz2-2 ONLINE 0 0 0 gpt/v1-e0c ONLINE 0 0 0 gpt/v1-e1b ONLINE 0 0 0 gpt/v1-e2b ONLINE 0 0 0 gpt/v1-e3b ONLINE 0 0 0 gpt/v1-e4b ONLINE 0 0 0 gpt/v1-e5a ONLINE 0 0 0 gpt/v1-e6a ONLINE 0 0 0 gpt/v1-e7c ONLINE 0 0 0 logs gpt/vr1log ONLINE 0 0 0 cache gpt/vr1cache ONLINE 0 0 0 errors: No known data errors ... it doesn't say it now, but there were 5 CKSUM errors on one of the drives that I had trial-removed (and not on the one replaced).