sporadic CAM (all devices) outage on 11-stable, mps(4), ahci(4) and bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot]

Stephen Mcconnell stephen.mcconnell at broadcom.com
Thu Jun 1 18:55:08 UTC 2017


Take a look at PR 212914. Could that be the issue? It was MFC'd to stable/11
with r309273 on Nov 28th, 2016.

Steve

> -----Original Message-----
> From: Harry Schmalzbauer [mailto:freebsd at omnilan.de]
> Sent: Thursday, June 01, 2017 12:31 PM
> To: Stephen Mcconnell
> Cc: freebsd-scsi at freebsd.org; Scott Long
> Subject: Re: sporadic CAM (all devices) outage on 11-stable, mps(4),
> ahci(4) and
> bhyve(8) involved. [Was: Re: mps(4) blocks panic-reboot]
>
> Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 20:12 (localtime):
> >> -----Original Message-----
> >> From: Harry Schmalzbauer [mailto:freebsd at omnilan.de]
> >> Sent: Thursday, June 01, 2017 12:03 PM
> >> To: Stephen Mcconnell
> >> Cc: freebsd-scsi at freebsd.org; Scott Long
> >> Subject: Re: mps(4) blocks panic-reboot
> >>
> >> Bezüglich Stephen Mcconnell's Nachricht vom 01.06.2017 19:36
> >> (localtime):
> >>> Can you try the attached patch and let me know how it goes? I didn't
> >>> test it, but since you know how, it might be easier this way. This
> >>> was diff'd from the latest mps files in stable/11, which I recently
> >>> updated (today).
> >>
> >> Thanks a lot, I noticed the highly appreciated MFC!
> >> Things are cooking... There were sysdecode userland changes, so I
> >> need to buidl world also, before my rollout system provides the
> >> update for this machine – will be ready in an hour.
> >>
> >> Since I have expert's attention, I'd like to ask a another mps(4)
> >> related
> >> question:
> >>
> >> I had unionfs deadlocks.  (I'm aware of the broken status of unionfs,
> >> and since I'm not able to fix it myself at the moment, I already
> >> replaced it with nullfs where possible, true for the following event)
> >>
> >> Since this machine has a memory-disk as rootfs (and 5 SSDs via mps(4)
> >> for bootpool and a separate syspool, where /var e.g. lives), I guess
> >> the deadlock is responsible for simultanious disappearance of all
> >> mps(4) attached drives.
> >>
> >> Is that plausable? (meaning, does the mps(4) driver depend on
> >> filesystem
> >> subsystem?)
> >>
> >> Or do you have any idea what else could lead to disapearance of all
> >> drives simultaniously? Other ata drives, via on-board ahci (C203)
> >> were not affected!
> >> UNfortunately, I haven't been able to record any kernel messages when
> >> that happened (3 times as far as I remember, no occurence since
> >> abandoning unionfs
> >> yet)
> >
> > This doesn't seem like an mps driver problem to me, but maybe someone
> > else here can help more than I can. I can't think of anything that
> > might be causing your drives to disappear. It would help if you could
> > get some kernel logs when this happens.
>
> Thanks, I should have searched beforehand... Two lies: At least once there
> were
> also SATA drives via ahci(4) affected, and I noted some kernel messages.
>
> Please see this post:
> https://lists.freebsd.org/pipermail/freebsd-scsi/2016-December/007216.html
>
> Sorry, thought it was longer ago and not discueesd at scsi@ at all...
>
> At that time, there was unionfs involved, which later lead to complete
> deadlocks
> on different setups with completely different applications.
> But I think that (deadlock) is one possible root of problems these setups
> had in
> common.
>
> So if one expert can tell me – nope, disapearing drives can't be related
> to
> (union)fs deadlocks, or the opposite, I'd be deeply grateful.
>
> -harry


More information about the freebsd-scsi mailing list