Patrick M. Hausen
2019-Apr-12 19:22 UTC
NVME aborting outstanding i/o and controller resets
Hi Warner, thanks for taking the time again ?> OK. This means that whatever I/O workload we've done has caused the NVME card to stop responding for 30s, so we reset it.I figured as much ;-)> So it's an intel card.Yes - I already added this info several times. 6 of them, 2.5? NVME ?disk drives?.> OK. That suggests Intel has a problem with their firmware.I came across this one: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713 Is it more probable that Intel has got buggy firmware here than that ?we? are missing interrupts? The mainboard is the Supermicro H11SSW-NT. Two NVME drive bays share a connector on the mainboard: NVMe Ports ( NVMe 0~7, 10, 11, 14, 15) The H11SSW-iN/NT has tweleve (12) NVMe ports (2 ports per 1 Slim SAS connector) on the motherboard. These ports provide high-speed, low-latency PCI-E 3.0 x4 connections directly from the CPU to NVMe Solid State (SSD) drives. This greatly increases SSD data- throughput performance and significantly reduces PCI-E latency by simplifying driver/software requirements resulting from direct PCI-E interface from the CPU to the NVMe SSD drives. Is this purely mechanical or do two drives share PCI-E resources? Which would explain why the problems always come in pairs (nvme6 and nvme7, for example). This afternoon I set up a system with 4 drives and I was not able to reproduce the problem. (We just got 3 more machines which happened to have 4 drives each and no M.2 directly on the mainboard). I will change the config to 6 drives like with the two FreeNAS systems in our data center.> [? nda(4) ...] > I doubt that would have any effect. They both throw as much I/O onto the card as possible in the default config.I found out - yes, just the same.> There's been some minor improvements in -current here. Any chance you could experimentally try that with this test? You won't get as many I/O abort errors (since we don't print those), and we have a few more workarounds for the reset path (though honestly, it's still kinda stinky).HEAD or RELENG_12, too? Kind regards, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Kaiserallee 13a Tel.: 0721 9109-0 Fax: -100 76133 Karlsruhe info at punkt.de http://punkt.de AG Mannheim 108285 Gf: Juergen Egeling
On Fri, Apr 12, 2019, 1:22 PM Patrick M. Hausen <hausen at punkt.de> wrote:> Hi Warner, > > thanks for taking the time again ? > > > OK. This means that whatever I/O workload we've done has caused the NVME > card to stop responding for 30s, so we reset it. > > I figured as much ;-) > > > So it's an intel card. > > Yes - I already added this info several times. 6 of them, 2.5? NVME ?disk > drives?. >Yea, it was more of a knowing sigh...> OK. That suggests Intel has a problem with their firmware. > > I came across this one: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713 > > Is it more probable that Intel has got buggy firmware here than that > ?we? are missing interrupts? >More probable bad firmware. One of the things I think that is in HEAD is a mitigation for this that looks for completed IO on timeout before doing a reset. The mainboard is the Supermicro H11SSW-NT. Two NVME drive bays share> a connector on the mainboard: > > NVMe Ports ( NVMe 0~7, 10, 11, 14, 15) > > The H11SSW-iN/NT has tweleve (12) NVMe ports (2 ports per 1 Slim > SAS connector) on the motherboard. > These ports provide high-speed, low-latency PCI-E 3.0 x4 > connections directly from the CPU to NVMe Solid > State (SSD) drives. This greatly increases SSD data- throughput > performance and significantly reduces PCI-E > latency by simplifying driver/software requirements resulting from > direct PCI-E interface from the CPU to the NVMe SSD drives. > > Is this purely mechanical or do two drives share PCI-E resources? Which > would explain > why the problems always come in pairs (nvme6 and nvme7, for example). >I'm unfamiliar with this setup, but coming in pairs increases the missed interrupt theory in my mind. Firmware issues usually don't come in pairs. This afternoon I set up a system with 4 drives and I was not able to> reproduce the problem. > (We just got 3 more machines which happened to have 4 drives each and no > M.2 directly > on the mainboard). > I will change the config to 6 drives like with the two FreeNAS systems in > our data center. > > > [? nda(4) ...] > > I doubt that would have any effect. They both throw as much I/O onto the > card as possible in the default config. > > I found out - yes, just the same. >NDA drives with an iosched kernel will be able to rate limit, which may be useful as a diagnostic tool...> There's been some minor improvements in -current here. Any chance you > could experimentally try that with this test? You won't get as many I/O > abort errors (since we don't print those), and we have a few more > workarounds for the reset path (though honestly, it's still kinda stinky). > > HEAD or RELENG_12, too? >HEAD is preferred, but any recent snapshot will do. Warner Kind regards,> Patrick > -- > punkt.de GmbH Internet - Dienstleistungen - Beratung > Kaiserallee 13a Tel.: 0721 9109-0 Fax: -100 > 76133 Karlsruhe info at punkt.de http://punkt.de > AG Mannheim 108285 Gf: Juergen Egeling > >