Intel NVMe troubles?

Mon Aug 1 18:49:32 UTC 2016

On Mon, Aug 1, 2016 at 7:38 AM, Borja Marcos <borjam at sarenet.es> wrote:

>
> > On 29 Jul 2016, at 17:44, Jim Harris <jim.harris at gmail.com> wrote:
> >
> >
> >
> > On Fri, Jul 29, 2016 at 1:10 AM, Borja Marcos <borjam at sarenet.es> wrote:
> >
> > > On 28 Jul 2016, at 19:25, Jim Harris <jim.harris at gmail.com> wrote:
> > >
> > > Yes, you should worry.
> > >
> > > Normally we could use the dump_debug sysctls to help debug this - these
> > > sysctls will dump the NVMe I/O submission and completion queues.  But
> in
> > > this case the LBA data is in the payload, not the NVMe submission
> entries,
> > > so dump_debug will not help as much as dumping the NVMe DSM payload
> > > directly.
> > >
> > > Could you try the attached patch and send output after recreating your
> pool?
> >
> > Just in case the evil anti-spam ate my answer, sent the results to your
> Gmail account.
> >
> >
> > Thanks Borja.
> >
> > It looks like all of the TRIM commands are formatted properly.  The
> failures do not happen until about 10 seconds after the last TRIM to each
> drive was submitted, and immediately before TRIMs start to the next drive,
> so I'm assuming the failures are for the the last few TRIM commands but
> cannot say for sure.  Could you apply patch v2 (attached) which will dump
> the TRIM payload contents inline with the failure messages?
>
> Sure, this is the complete /var/log/messages starting with the system
> boot. Before booting I destroyed the pool
> so that you could capture what happens when booting, zpool create, etc.
>
> Remember that the drives are in LBA format #3 (4 KB blocks). As far as I
> know that’s preferred to the old 512 byte blocks.
>
> Thank you very much and sorry about the belated response.

Hi Borja,

Thanks for the additional testing.  This has all of the detail that I need
for now.

-Jim