On Fri, Apr 12, 2019 at 6:00 AM Patrick M. Hausen <hausen at punkt.de> wrote:> Hi all, > > my problems seem not to be TRIM related after all ? and I can now > quickly reproduce it. > > ====> root at freenas01[~]# sysctl vfs.zfs.trim.enabled > vfs.zfs.trim.enabled: 0 > ====> root at freenas01[~]# cd /mnt/zfs > root at freenas01[/mnt/zfs]# dd if=/dev/urandom of=hurz bs=10m > ^C ? system freezes temporarily >This does one I/O at a time to the filesystem, which then repackages the I/Os such that multiple I/Os are going on with the NVMe card.> ====> Apr 12 13:42:16 freenas01 nvme6: resetting controller >OK. This means that whatever I/O workload we've done has caused the NVME card to stop responding for 30s, so we reset it.> Apr 12 13:42:16 freenas01 nvme6: aborting outstanding i/o > Apr 12 13:42:16 freenas01 nvme6: WRITE sqid:1 cid:117 nsid:1 lba:981825104 > len:176 > Apr 12 13:42:16 freenas01 nvme6: ABORTED - BY REQUEST (00/07) sqid:1 > cid:117 cdw0:0 >But only one request was in flight... And we keep doing it over and over again, but to different LBAs suggesting that we're stuttering: a few go through and then things wedge again. This happens every 30ish seconds.> Apr 12 13:42:49 freenas01 nvme6: resetting controller > Apr 12 13:42:50 freenas01 nvme6: aborting outstanding i/o > Apr 12 13:42:50 freenas01 nvme6: WRITE sqid:1 cid:127 nsid:1 lba:984107936 > len:96 > Apr 12 13:42:50 freenas01 nvme6: ABORTED - BY REQUEST (00/07) sqid:1 > cid:127 cdw0:0 > Apr 12 13:43:35 freenas01 nvme6: resetting controller > Apr 12 13:43:35 freenas01 nvme6: aborting outstanding i/o > Apr 12 13:43:35 freenas01 nvme6: WRITE sqid:1 cid:112 nsid:1 lba:976172032 > len:176 > Apr 12 13:43:35 freenas01 nvme6: ABORTED - BY REQUEST (00/07) sqid:1 > cid:112 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: resetting controller >And then this one goes wonkies.> Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:111 nsid:1 lba:976199176 > len:248 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:111 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:102 nsid:1 lba:976199432 > len:248 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:102 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:112 nsid:1 lba:976199680 > len:8 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:112 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:105 nsid:1 lba:976199752 > len:64 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:105 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:122 nsid:1 lba:976199816 > len:64 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:122 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:103 nsid:1 lba:976199688 > len:64 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:103 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:126 nsid:1 lba:976200136 > len:56 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:126 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:106 nsid:1 lba:976200192 > len:8 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:106 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:107 nsid:1 lba:976200200 > len:64 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:107 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:127 nsid:1 lba:976200264 > len:64 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:127 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:113 nsid:1 lba:976200328 > len:120 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:113 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:108 nsid:1 lba:976200448 > len:72 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:108 cdw0:0 > Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o > Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:116 nsid:1 lba:976200520 > len:64 > Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 > cid:116 cdw0:0 > ====> root at freenas01[~]# nvmecontrol identify nvme6 > Controller Capabilities/Features > ===============================> Vendor ID: 8086 > Subsystem Vendor ID: 8086 > Serial Number: BTLJ90230EC61P0FGN > Model Number: INTEL SSDPE2KX010T8 >So it's an intel card.> Firmware Version: VDV10131 > Recommended Arb Burst: 0 > IEEE OUI Identifier: e4 d2 5c > Multi-Interface Cap: 00 > Max Data Transfer Size: 131072 > Controller ID: 0x00 > > Admin Command Set Attributes > ===========================> Security Send/Receive: Not Supported > Format NVM: Supported > Firmware Activate/Download: Supported > Namespace Managment: Supported > Abort Command Limit: 4 > Async Event Request Limit: 4 > Number of Firmware Slots: 1 > Firmware Slot 1 Read-Only: No > Per-Namespace SMART Log: No > Error Log Page Entries: 64 > Number of Power States: 1 > > NVM Command Set Attributes > =========================> Submission Queue Entry Size > Max: 64 > Min: 64 > Completion Queue Entry Size > Max: 16 > Min: 16 > Number of Namespaces: 1 > Compare Command: Not Supported > Write Uncorrectable Command: Supported > Dataset Management Command: Supported > Volatile Write Cache: Not Present > > Namespace Drive Attributes > =========================> NVM total cap: 1000204886016 > NVM unallocated cap: 0 > ====> root at freenas01[~]# zpool status > pool: freenas-boot > state: ONLINE > scan: scrub repaired 0 in 0 days 00:00:03 with 0 errors on Sun Apr 7 > 03:45:03 2019 > config: > > NAME STATE READ WRITE CKSUM > freenas-boot ONLINE 0 0 0 > mirror-0 ONLINE 0 0 0 > nvd0p2 ONLINE 0 0 0 > nvd1p2 ONLINE 0 0 0 > > errors: No known data errors > > pool: zfs > state: ONLINE > scan: scrub repaired 0 in 0 days 00:01:53 with 0 errors on Fri Mar 22 > 19:53:37 2019 > config: > > NAME STATE READ > WRITE CKSUM > zfs ONLINE 0 > 0 0 > raidz2-0 ONLINE 0 > 0 0 > gptid/97d0a7ce-44e5-11e9-982e-0025905f99ac ONLINE 0 > 0 0 > gptid/98053880-44e5-11e9-982e-0025905f99ac ONLINE 0 > 0 0 > gptid/983a9468-44e5-11e9-982e-0025905f99ac ONLINE 0 > 0 0 > gptid/987100f2-44e5-11e9-982e-0025905f99ac ONLINE 0 > 0 0 > gptid/98aa6e88-44e5-11e9-982e-0025905f99ac ONLINE 0 > 0 0 > gptid/98f20b8c-44e5-11e9-982e-0025905f99ac ONLINE 0 > 0 0 > > errors: No known data errors > ====> > The problem only appears on the data pool built from 6 INTEL > SSDPE2KX010T8. The system > pool built from two KXG50ZNV256G TOSHIBA does not show any problem with > write load. >OK. That suggests Intel has a problem with their firmware.> All the Intel drives have the latest firmware according to the Intel > support website. > > Could it possibly help to tweak dev.nvme.7.ioq0.num_entries and similar > entries? >It might. Maybe. I'm unsure what the intel firmware bug is.> What about switching to the nda device instead of nvd? >I doubt that would have any effect. They both throw as much I/O onto the card as possible in the default config. There's been some minor improvements in -current here. Any chance you could experimentally try that with this test? You won't get as many I/O abort errors (since we don't print those), and we have a few more workarounds for the reset path (though honestly, it's still kinda stinky). Warner
Patrick M. Hausen
2019-Apr-12 19:22 UTC
NVME aborting outstanding i/o and controller resets
Hi Warner, thanks for taking the time again ?> OK. This means that whatever I/O workload we've done has caused the NVME card to stop responding for 30s, so we reset it.I figured as much ;-)> So it's an intel card.Yes - I already added this info several times. 6 of them, 2.5? NVME ?disk drives?.> OK. That suggests Intel has a problem with their firmware.I came across this one: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211713 Is it more probable that Intel has got buggy firmware here than that ?we? are missing interrupts? The mainboard is the Supermicro H11SSW-NT. Two NVME drive bays share a connector on the mainboard: NVMe Ports ( NVMe 0~7, 10, 11, 14, 15) The H11SSW-iN/NT has tweleve (12) NVMe ports (2 ports per 1 Slim SAS connector) on the motherboard. These ports provide high-speed, low-latency PCI-E 3.0 x4 connections directly from the CPU to NVMe Solid State (SSD) drives. This greatly increases SSD data- throughput performance and significantly reduces PCI-E latency by simplifying driver/software requirements resulting from direct PCI-E interface from the CPU to the NVMe SSD drives. Is this purely mechanical or do two drives share PCI-E resources? Which would explain why the problems always come in pairs (nvme6 and nvme7, for example). This afternoon I set up a system with 4 drives and I was not able to reproduce the problem. (We just got 3 more machines which happened to have 4 drives each and no M.2 directly on the mainboard). I will change the config to 6 drives like with the two FreeNAS systems in our data center.> [? nda(4) ...] > I doubt that would have any effect. They both throw as much I/O onto the card as possible in the default config.I found out - yes, just the same.> There's been some minor improvements in -current here. Any chance you could experimentally try that with this test? You won't get as many I/O abort errors (since we don't print those), and we have a few more workarounds for the reset path (though honestly, it's still kinda stinky).HEAD or RELENG_12, too? Kind regards, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Kaiserallee 13a Tel.: 0721 9109-0 Fax: -100 76133 Karlsruhe info at punkt.de http://punkt.de AG Mannheim 108285 Gf: Juergen Egeling