> On 28 Jul 2016, at 19:25, Jim Harris <jim.harris at gmail.com> wrote: > > Yes, you should worry. > > Normally we could use the dump_debug sysctls to help debug this - these > sysctls will dump the NVMe I/O submission and completion queues. But in > this case the LBA data is in the payload, not the NVMe submission entries, > so dump_debug will not help as much as dumping the NVMe DSM payload > directly. > > Could you try the attached patch and send output after recreating your pool?Just in case the evil anti-spam ate my answer, sent the results to your Gmail account. Borja.
On Fri, Jul 29, 2016 at 1:10 AM, Borja Marcos <borjam at sarenet.es> wrote:> > > On 28 Jul 2016, at 19:25, Jim Harris <jim.harris at gmail.com> wrote: > > > > Yes, you should worry. > > > > Normally we could use the dump_debug sysctls to help debug this - these > > sysctls will dump the NVMe I/O submission and completion queues. But in > > this case the LBA data is in the payload, not the NVMe submission > entries, > > so dump_debug will not help as much as dumping the NVMe DSM payload > > directly. > > > > Could you try the attached patch and send output after recreating your > pool? > > Just in case the evil anti-spam ate my answer, sent the results to your > Gmail account. > >Thanks Borja. It looks like all of the TRIM commands are formatted properly. The failures do not happen until about 10 seconds after the last TRIM to each drive was submitted, and immediately before TRIMs start to the next drive, so I'm assuming the failures are for the the last few TRIM commands but cannot say for sure. Could you apply patch v2 (attached) which will dump the TRIM payload contents inline with the failure messages? Thanks, -Jim -------------- next part -------------- A non-text attachment was scrubbed... Name: delete_debug_v2.patch Type: application/octet-stream Size: 1270 bytes Desc: not available URL: <http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20160729/b7a73613/attachment.obj>
> On 29 Jul 2016, at 17:44, Jim Harris <jim.harris at gmail.com> wrote: > > > > On Fri, Jul 29, 2016 at 1:10 AM, Borja Marcos <borjam at sarenet.es> wrote: > > > On 28 Jul 2016, at 19:25, Jim Harris <jim.harris at gmail.com> wrote: > > > > Yes, you should worry. > > > > Normally we could use the dump_debug sysctls to help debug this - these > > sysctls will dump the NVMe I/O submission and completion queues. But in > > this case the LBA data is in the payload, not the NVMe submission entries, > > so dump_debug will not help as much as dumping the NVMe DSM payload > > directly. > > > > Could you try the attached patch and send output after recreating your pool? > > Just in case the evil anti-spam ate my answer, sent the results to your Gmail account. > > > Thanks Borja. > > It looks like all of the TRIM commands are formatted properly. The failures do not happen until about 10 seconds after the last TRIM to each drive was submitted, and immediately before TRIMs start to the next drive, so I'm assuming the failures are for the the last few TRIM commands but cannot say for sure. Could you apply patch v2 (attached) which will dump the TRIM payload contents inline with the failure messages?Sure, this is the complete /var/log/messages starting with the system boot. Before booting I destroyed the pool so that you could capture what happens when booting, zpool create, etc. Remember that the drives are in LBA format #3 (4 KB blocks). As far as I know that?s preferred to the old 512 byte blocks. Thank you very much and sorry about the belated response. Borja.