thr3ads.net - freebsd stable - NVME aborting outstanding i/o and controller resets [Apr 2019]

If this information is useful, please help other people find it:
Share via:

Warner Losh

2019-Apr-05 15:08 UTC

NVME aborting outstanding i/o

On Fri, Apr 5, 2019 at 1:33 AM Patrick M. Hausen <hausen at punkt.de>
wrote:
> Hi all,
>
> > Am 04.04.2019 um 17:11 schrieb Warner Losh <imp at bsdimp.com>:
> > There's a request that was sent down to the drive. It took longer
than
> 30s to respond. One of them, at least, was a trim request.
> > [?]
>
> Thanks for the explanation.
>
> This further explains why I was seeing a lot more of those and the system
> occasionally froze for a couple of seconds after I increased these:
>
> vfs.zfs.vdev.async_write_max_active: 10
> vfs.zfs.vdev.async_read_max_active: 3
> vfs.zfs.vdev.sync_write_max_active: 10
> vfs.zfs.vdev.sync_read_max_active: 10
>
> as recommended by Allan Jude reasoning that NVME devices could work on
> up to 64 requests in parallel. I have since reverted that change and I am
> running with the defaults.
>
> If I understand correctly, this:
>
> >         hw.nvme.per_cpu_io_queues=0
>
> essentially limits the rate at which the system throws commands at the
> devices. Correct?
>
Yes. It de-facto limits the number of commands the system can throw at a
nvme drive. Some drives have trouble with multiple CPUs submitting things.
Others just have trouble with the volume of commands sometimes. This limits
both.

> So it?s not a real fix and there?s nothing fundamentally wrong with the
> per CPU queue or
> interrupt implementation. I will look into new firmware for my Intel
> devices and
> try tweaking the vfs.zfs.vdev.trim_max_active and related parameters.
>
Correct. It's a hack-a-around.

> Out of curiosity: what happens if I disable TRIM? My knowledge is rather
> superficial
> and I just filed that under ?TRIM is absolutely essential lest performance
> will
> suffer severely and your devices die - plus bad karma, of course ?? ;-)
>
TRIMs help the drive optimize their garbage collection by giving it a
larger pool of free blocks to work with. This has the effect of reducing
write amplification. Write amp is the measure of the amount of extra work
the drive  has to do for every user write it processes. Ideally, you want
this number to be 1.0. You'll never get to 1.0, but numbers less than 1.5
are common and most of the models drive makers use to rate the lifetime of
their NAND assume a write amp of about 2.

So, if you eliminate the TRIMs you eliminate this optimization and write
amp will increase. This has two bad effects. First, wear and tear on the
NAND. Second, it takes resources away from the user. In practice, however,
the bad effects are quite limited if you don't have a write intensive
workload. Your drive is rated for so many drive writes per day (or
equivalently total data written over the life of the drive). This will be
on the spec sheet somewhere. If you don't have a write intensive workload
(which I'd say is any sustained write load greater than about 1/10th the
datasheet write limit), then if you think TRIMs are causing issues, you
should disable them. The effects of not trimming are likely to be in the
noise on such systems, and the benefits of having things TRIMed will be
less.

At work, for a large video streaming company, we enable the TRIMs, even
though we're on the edge of the rule of thumb since we're unsure how
long
the machines really need to be in the field and don't want to risk it.
Except for the version of Samsung nvme drives (PM963, no longer made) we
got a while ago... those we turn TRIM off on because UFS' machine-gunning
down of TRIMs and nvd's blind pass-through of TRIMs took down the drive.
UFS now combines TRIMs and we've moved to using nda since it also combines
TRIMs and it won't be so bad if we tried again today.

Drive makers optimize different things. Enterprise drives handle TRIMs a
lot better than consumer drives. consumer drives are cheaper (in oh so many
ways), so some care is needed. Intel makes a wide range of drives, from the
super duper awesome (with prices to match) to the somewhat disappointing
(but incredibly cheap and good enough for a lot of applicaitons). Not sure
where on this scale your drives fall on this spectrum.

tl;dr: Unless you are writing the snot out of those Intel drives, disabling
TRIM entirely will likely help avoid pushing so many commands they timeout.

Warner

Patrick M. Hausen

2019-Apr-12 11:58 UTC

head link

NVME aborting outstanding i/o and controller resets

Hi all,

my problems seem not to be TRIM related after all ? and I can now
quickly reproduce it.

====root at freenas01[~]# sysctl vfs.zfs.trim.enabled
vfs.zfs.trim.enabled: 0
====root at freenas01[~]# cd /mnt/zfs
root at freenas01[/mnt/zfs]# dd if=/dev/urandom of=hurz bs=10m
^C ? system freezes temporarily
====Apr 12 13:42:16 freenas01 nvme6: resetting controller
Apr 12 13:42:16 freenas01 nvme6: aborting outstanding i/o
Apr 12 13:42:16 freenas01 nvme6: WRITE sqid:1 cid:117 nsid:1 lba:981825104
len:176
Apr 12 13:42:16 freenas01 nvme6: ABORTED - BY REQUEST (00/07) sqid:1 cid:117
cdw0:0
Apr 12 13:42:49 freenas01 nvme6: resetting controller
Apr 12 13:42:50 freenas01 nvme6: aborting outstanding i/o
Apr 12 13:42:50 freenas01 nvme6: WRITE sqid:1 cid:127 nsid:1 lba:984107936
len:96
Apr 12 13:42:50 freenas01 nvme6: ABORTED - BY REQUEST (00/07) sqid:1 cid:127
cdw0:0
Apr 12 13:43:35 freenas01 nvme6: resetting controller
Apr 12 13:43:35 freenas01 nvme6: aborting outstanding i/o
Apr 12 13:43:35 freenas01 nvme6: WRITE sqid:1 cid:112 nsid:1 lba:976172032
len:176
Apr 12 13:43:35 freenas01 nvme6: ABORTED - BY REQUEST (00/07) sqid:1 cid:112
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: resetting controller
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:111 nsid:1 lba:976199176
len:248
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:111
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:102 nsid:1 lba:976199432
len:248
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:102
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:112 nsid:1 lba:976199680 len:8
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:112
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:105 nsid:1 lba:976199752
len:64
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:105
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:122 nsid:1 lba:976199816
len:64
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:122
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:103 nsid:1 lba:976199688
len:64
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:103
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:126 nsid:1 lba:976200136
len:56
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:126
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:106 nsid:1 lba:976200192 len:8
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:106
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:107 nsid:1 lba:976200200
len:64
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:107
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:127 nsid:1 lba:976200264
len:64
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:127
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:113 nsid:1 lba:976200328
len:120
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:113
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:108 nsid:1 lba:976200448
len:72
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:108
cdw0:0
Apr 12 13:44:06 freenas01 nvme7: aborting outstanding i/o
Apr 12 13:44:06 freenas01 nvme7: WRITE sqid:1 cid:116 nsid:1 lba:976200520
len:64
Apr 12 13:44:06 freenas01 nvme7: ABORTED - BY REQUEST (00/07) sqid:1 cid:116
cdw0:0
====root at freenas01[~]# nvmecontrol identify nvme6
Controller Capabilities/Features
===============================Vendor ID:                  8086
Subsystem Vendor ID:        8086
Serial Number:              BTLJ90230EC61P0FGN
Model Number:               INTEL SSDPE2KX010T8
Firmware Version:           VDV10131
Recommended Arb Burst:      0
IEEE OUI Identifier:        e4 d2 5c
Multi-Interface Cap:        00
Max Data Transfer Size:     131072
Controller ID:              0x00

Admin Command Set Attributes
===========================Security Send/Receive:       Not Supported
Format NVM:                  Supported
Firmware Activate/Download:  Supported
Namespace Managment:         Supported
Abort Command Limit:         4
Async Event Request Limit:   4
Number of Firmware Slots:    1
Firmware Slot 1 Read-Only:   No
Per-Namespace SMART Log:     No
Error Log Page Entries:      64
Number of Power States:      1

NVM Command Set Attributes
=========================Submission Queue Entry Size
  Max:                       64
  Min:                       64
Completion Queue Entry Size
  Max:                       16
  Min:                       16
Number of Namespaces:        1
Compare Command:             Not Supported
Write Uncorrectable Command: Supported
Dataset Management Command:  Supported
Volatile Write Cache:        Not Present

Namespace Drive Attributes
=========================NVM total cap:               1000204886016
NVM unallocated cap:         0
====root at freenas01[~]# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:03 with 0 errors on Sun Apr  7 03:45:03
2019
config:

	NAME        STATE     READ WRITE CKSUM
	freenas-boot  ONLINE       0     0     0
	  mirror-0  ONLINE       0     0     0
	    nvd0p2  ONLINE       0     0     0
	    nvd1p2  ONLINE       0     0     0

errors: No known data errors

  pool: zfs
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:01:53 with 0 errors on Fri Mar 22 19:53:37
2019
config:

	NAME                                            STATE     READ WRITE CKSUM
	zfs                                             ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    gptid/97d0a7ce-44e5-11e9-982e-0025905f99ac  ONLINE       0     0     0
	    gptid/98053880-44e5-11e9-982e-0025905f99ac  ONLINE       0     0     0
	    gptid/983a9468-44e5-11e9-982e-0025905f99ac  ONLINE       0     0     0
	    gptid/987100f2-44e5-11e9-982e-0025905f99ac  ONLINE       0     0     0
	    gptid/98aa6e88-44e5-11e9-982e-0025905f99ac  ONLINE       0     0     0
	    gptid/98f20b8c-44e5-11e9-982e-0025905f99ac  ONLINE       0     0     0

errors: No known data errors
====
The problem only appears on the data pool built from 6 INTEL SSDPE2KX010T8. The
system
pool built from two KXG50ZNV256G TOSHIBA does not show any problem with write
load.

All the Intel drives have the latest firmware according to the Intel support
website.

Could it possibly help to tweak dev.nvme.7.ioq0.num_entries and similar entries?

What about switching to the nda device instead of nvd?

Kind regards,
Patrick
-- 
punkt.de GmbH			Internet - Dienstleistungen - Beratung
Kaiserallee 13a			Tel.: 0721 9109-0 Fax: -100
76133 Karlsruhe			info at punkt.de	http://punkt.de
AG Mannheim 108285		Gf: Juergen Egeling

freebsd stable - Apr 2019 - NVME aborting outstanding i/o and controller resets

NVME aborting outstanding i/o

NVME aborting outstanding i/o and controller resets