Back in May, I posted about issues I was having with a Dell PE R630 with 4x800GB NVMe SSDs. I would get kernel panics due to the inability to assign all the interrupts because of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321>. Jim Harris helped fix this issue so I bought several more of these servers, Including ones with 4x1.6TB drives? while the new servers with 4x800GB drives still work, the ones with 4x1.6TB drives do not. When I do a zpool create tank mirror nvd0 nvd1 mirror nvd2 nvd3 the command never returns and the kernel logs: nvme0: resetting controller nvme0: controller ready did not become 0 within 2000 ms I?ve tried several different things trying to understand where the actual problem is. WORKS: dd if=/dev/nvd0 of=/dev/null bs=1m WORKS: dd if=/dev/zero of=/dev/nvd0 bs=1m WORKS: newfs /dev/nvd0 FAILS: zpool create tank mirror nvd[01] FAILS: gpart add -t freebsd-zfs nvd[01] && zpool create tank mirror nvd[01]p1 FAILS: gpart add -t freebsd-zfs -s 1400g nvd[01[ && zpool create tank nvd[01]p1 WORKS: gpart add -t freebsd-zfs -s 800g nvd[01] && zpool create tank nvd[01]p1 NOTE: The above commands are more about getting the point across, not validity. I wiped the disk clean between gpart attempts and used GPT. So it seems like zpool works if I don?t cross past ~800GB. But other things like dd and newfs work. When I get the kernel messages about the controller resetting and then not responding, the NVMe subsystem hangs entirely. Since my boot disks are not NVMe, the system continues to work but no more NVMe stuff can be done. Further, attempting to reboot hangs and I have to do a power cycle. Any thoughts on what the deal may be here? 10.2-RELEASE-p5 nvme0 at pci0:132:0:0: class=0x010802 card=0x1f971028 chip=0xa820144d rev=0x03 hdr=0x00 vendor = 'Samsung Electronics Co Ltd' class = mass storage subclass = NVM -- Sean Kelly smkelly at smkelly.org http://smkelly.org
On Tue, Oct 06, 2015 at 10:18:11AM -0500, Sean Kelly wrote:> Back in May, I posted about issues I was having with a Dell PE R630 with 4x800GB NVMe SSDs. I would get kernel panics due to the inability to assign all the interrupts because of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321>. Jim Harris helped fix this issue so I bought several more of these servers, Including ones with 4x1.6TB drives... > > while the new servers with 4x800GB drives still work, the ones with 4x1.6TB drives do not. When I do a > zpool create tank mirror nvd0 nvd1 mirror nvd2 nvd3 > the command never returns and the kernel logs: > nvme0: resetting controller > nvme0: controller ready did not become 0 within 2000 ms > > I've tried several different things trying to understand where the actual problem is. > WORKS: dd if=/dev/nvd0 of=/dev/null bs=1m > WORKS: dd if=/dev/zero of=/dev/nvd0 bs=1m > WORKS: newfs /dev/nvd0 > FAILS: zpool create tank mirror nvd[01] > FAILS: gpart add -t freebsd-zfs nvd[01] && zpool create tank mirror nvd[01]p1 > FAILS: gpart add -t freebsd-zfs -s 1400g nvd[01[ && zpool create tank nvd[01]p1 > WORKS: gpart add -t freebsd-zfs -s 800g nvd[01] && zpool create tank nvd[01]p1 > > NOTE: The above commands are more about getting the point across, not validity. I wiped the disk clean between gpart attempts and used GPT.Just for purity of the experiment: do you try zpool on raw disk, w/o GPT? I.e. zpool create tank mirror nvd0 nvd1> So it seems like zpool works if I don't cross past ~800GB. But other things like dd and newfs work. > > When I get the kernel messages about the controller resetting and then not responding, the NVMe subsystem hangs entirely. Since my boot disks are not NVMe, the system continues to work but no more NVMe stuff can be done. Further, attempting to reboot hangs and I have to do a power cycle. > > Any thoughts on what the deal may be here? > > 10.2-RELEASE-p5 > > nvme0 at pci0:132:0:0: class=0x010802 card=0x1f971028 chip=0xa820144d rev=0x03 hdr=0x00 > vendor = 'Samsung Electronics Co Ltd' > class = mass storage > subclass = NVM > > -- > Sean Kelly > smkelly at smkelly.org > http://smkelly.org > > _______________________________________________ > freebsd-stable at freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
On 10/06/2015 10:18, Sean Kelly wrote:> Back in May, I posted about issues I was having with a Dell PE R630 with 4x800GB NVMe SSDs. I would get kernel panics due to the inability to assign all the interrupts because of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321>. Jim Harris helped fix this issue so I bought several more of these servers, Including ones with 4x1.6TB drives? > > while the new servers with 4x800GB drives still work, the ones with 4x1.6TB drives do not. When I do a > zpool create tank mirror nvd0 nvd1 mirror nvd2 nvd3 > the command never returns and the kernel logs: > nvme0: resetting controller > nvme0: controller ready did not become 0 within 2000 ms > > I?ve tried several different things trying to understand where the actual problem is. > WORKS: dd if=/dev/nvd0 of=/dev/null bs=1m > WORKS: dd if=/dev/zero of=/dev/nvd0 bs=1m > WORKS: newfs /dev/nvd0 > FAILS: zpool create tank mirror nvd[01] > FAILS: gpart add -t freebsd-zfs nvd[01] && zpool create tank mirror nvd[01]p1 > FAILS: gpart add -t freebsd-zfs -s 1400g nvd[01[ && zpool create tank nvd[01]p1 > WORKS: gpart add -t freebsd-zfs -s 800g nvd[01] && zpool create tank nvd[01]p1 > > NOTE: The above commands are more about getting the point across, not validity. I wiped the disk clean between gpart attempts and used GPT. > > So it seems like zpool works if I don?t cross past ~800GB. But other things like dd and newfs work. > > When I get the kernel messages about the controller resetting and then not responding, the NVMe subsystem hangs entirely. Since my boot disks are not NVMe, the system continues to work but no more NVMe stuff can be done. Further, attempting to reboot hangs and I have to do a power cycle. > > Any thoughts on what the deal may be here? > > 10.2-RELEASE-p5 > > nvme0 at pci0:132:0:0: class=0x010802 card=0x1f971028 chip=0xa820144d rev=0x03 hdr=0x00 > vendor = 'Samsung Electronics Co Ltd' > class = mass storage > subclass = NVMTry this: sysctl vfs.zfs.vdev.trim_on_init=0 zpool create tank mirror nvd[01] Eric
As a guess you're timing out the full disk TRIM request. Try: sysctl vfs.zfs.vdev.trim_on_init=0 and then re-run the create. On 06/10/2015 16:18, Sean Kelly wrote:> Back in May, I posted about issues I was having with a Dell PE R630 with 4x800GB NVMe SSDs. I would get kernel panics due to the inability to assign all the interrupts because of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321>. Jim Harris helped fix this issue so I bought several more of these servers, Including ones with 4x1.6TB drives? > > while the new servers with 4x800GB drives still work, the ones with 4x1.6TB drives do not. When I do a > zpool create tank mirror nvd0 nvd1 mirror nvd2 nvd3 > the command never returns and the kernel logs: > nvme0: resetting controller > nvme0: controller ready did not become 0 within 2000 ms > > I?ve tried several different things trying to understand where the actual problem is. > WORKS: dd if=/dev/nvd0 of=/dev/null bs=1m > WORKS: dd if=/dev/zero of=/dev/nvd0 bs=1m > WORKS: newfs /dev/nvd0 > FAILS: zpool create tank mirror nvd[01] > FAILS: gpart add -t freebsd-zfs nvd[01] && zpool create tank mirror nvd[01]p1 > FAILS: gpart add -t freebsd-zfs -s 1400g nvd[01[ && zpool create tank nvd[01]p1 > WORKS: gpart add -t freebsd-zfs -s 800g nvd[01] && zpool create tank nvd[01]p1 > > NOTE: The above commands are more about getting the point across, not validity. I wiped the disk clean between gpart attempts and used GPT. > > So it seems like zpool works if I don?t cross past ~800GB. But other things like dd and newfs work. > > When I get the kernel messages about the controller resetting and then not responding, the NVMe subsystem hangs entirely. Since my boot disks are not NVMe, the system continues to work but no more NVMe stuff can be done. Further, attempting to reboot hangs and I have to do a power cycle. > > Any thoughts on what the deal may be here? > > 10.2-RELEASE-p5 > > nvme0 at pci0:132:0:0: class=0x010802 card=0x1f971028 chip=0xa820144d rev=0x03 hdr=0x00 > vendor = 'Samsung Electronics Co Ltd' > class = mass storage > subclass = NVM >
Also looks like nvme exposes a timeout_period sysctl you could try increasing that as it could be too small for a full disk TRIM. Under CAM SCSI da support we have a delete_max which limits the max single request size for a delete it may be we need something similar for nvme as well to prevent this as it should still be chunking the deletes to ensure this sort of thing doesn't happen. On 06/10/2015 16:18, Sean Kelly wrote:> Back in May, I posted about issues I was having with a Dell PE R630 with 4x800GB NVMe SSDs. I would get kernel panics due to the inability to assign all the interrupts because of https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321 <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=199321>. Jim Harris helped fix this issue so I bought several more of these servers, Including ones with 4x1.6TB drives? > > while the new servers with 4x800GB drives still work, the ones with 4x1.6TB drives do not. When I do a > zpool create tank mirror nvd0 nvd1 mirror nvd2 nvd3 > the command never returns and the kernel logs: > nvme0: resetting controller > nvme0: controller ready did not become 0 within 2000 ms > > I?ve tried several different things trying to understand where the actual problem is. > WORKS: dd if=/dev/nvd0 of=/dev/null bs=1m > WORKS: dd if=/dev/zero of=/dev/nvd0 bs=1m > WORKS: newfs /dev/nvd0 > FAILS: zpool create tank mirror nvd[01] > FAILS: gpart add -t freebsd-zfs nvd[01] && zpool create tank mirror nvd[01]p1 > FAILS: gpart add -t freebsd-zfs -s 1400g nvd[01[ && zpool create tank nvd[01]p1 > WORKS: gpart add -t freebsd-zfs -s 800g nvd[01] && zpool create tank nvd[01]p1 > > NOTE: The above commands are more about getting the point across, not validity. I wiped the disk clean between gpart attempts and used GPT. > > So it seems like zpool works if I don?t cross past ~800GB. But other things like dd and newfs work. > > When I get the kernel messages about the controller resetting and then not responding, the NVMe subsystem hangs entirely. Since my boot disks are not NVMe, the system continues to work but no more NVMe stuff can be done. Further, attempting to reboot hangs and I have to do a power cycle. > > Any thoughts on what the deal may be here? > > 10.2-RELEASE-p5 > > nvme0 at pci0:132:0:0: class=0x010802 card=0x1f971028 chip=0xa820144d rev=0x03 hdr=0x00 > vendor = 'Samsung Electronics Co Ltd' > class = mass storage > subclass = NVM >