Poor interaction between gmultipath(8), ZFS and isp(4)

Wed Aug 3 14:43:27 UTC 2011

Known problem. Or rather, one of a long set of known problems.

Most of these were addressed at Panasas under RELENG_7, but I have not 
had the time to redo them for RELENG_8 and later. Nor was I really happy 
with a lot of the results. At least from my perspective, due to work 
commitments, I'm unlikely to get to this very soon. Regrets.

On 8/3/2011 5:43 AM, Stephane LAPIE wrote:
> Hello list,
>
> (Not 100% sure the bug is in GEOM_MULTIPATH or in another driver.)
>
> I am running a FreeBSD 8.2-RELEASE server with ZFSv15, with the
> following hardware :
>
> http://www.darkbsd.org/~darksoul/server_dmesg.txt
>
> I have a dual fibre-channel controller (isp(4) driver), and I am
> accessing 16 RAID0 logical drives on a Promise vTrak E630fD (1 volume /
> physical disk)
>
> Since both controllers are plugged to the same storage unit with no LUN
> masking, both controllers end up seeing the same devices. Which is what
> made me combine these devices using geom_multipath.
>
> Here is my zpool structure :
> config:
>
>          NAME                  STATE     READ WRITE CKSUM
>          data                  ONLINE       0     0     0
>            raidz1              ONLINE       0     0     0
>              multipath/disk0   ONLINE       0     0     0
>              multipath/disk1   ONLINE       0     0     0
>              multipath/disk2   ONLINE       0     0     0
>              multipath/disk3   ONLINE       0     0     0
>              multipath/disk4   ONLINE       0     0     0
>              multipath/disk5   ONLINE       0     0     0
>              multipath/disk6   ONLINE       0     0     0
>              multipath/disk7   ONLINE       0     0     0
>            raidz1              ONLINE       0     0     0
>              multipath/disk8   ONLINE       0     0     0
>              multipath/disk9   ONLINE       0     0     0
>              multipath/disk10  ONLINE       0     0     0
>              multipath/disk11  ONLINE       0     0     0
>              multipath/disk12  ONLINE       0     0     0
>              multipath/disk13  ONLINE       0     0     0
>              multipath/disk14  ONLINE       0     0     0
>              multipath/disk15  ONLINE       0     0     0
>
> errors: No known data errors
>
>
> Using gmultipath, I eventually want to have disk{1,3,5,7,9,11,13,15} use
> the second controller, while the rest uses the first. The idea was that
> if anyone removed the fiber, it would switch everything over to the
> remaining fiber.
>
> For the sake of testing, I put every multipath device on the same
> controller, isp1.
>
> Here is the kernel log fragment I could acquire from my test (removing a
> fiber on which transfers are actively running), however since I don't
> have serial console access, I couldn't acquire the relevant kernel panic
> trace (it simply mentions a kernel trap during a page fault in g_mp_kt
> in the last readable section displayed, but I reckon it's like every CPU
> raises the panic message)
>
> http://www.darkbsd.org/~darksoul/server_lastlog_before_kernelpanic.txt
>
> After that, I get the aforementioned kernel panic. I can consistently
> reproduce it, and will try to acquire serial console output to get more
> detailed kernel panic trace, but it feels like everything is occuring at
> the same time without proper locking, or confirming relevant structures
> are still allocated. This looks like a race condition between isp(4)
> loopdown provoking da(4) destruction, and gmultipath(8) failover.
> (Therefore having g_mp_kt accessing a da(4) structure that is being
> destroyed, or already destroyed, and accessing unallocated memory)
>
> Maybe this is similar to this issue :
> http://freebsd.1045724.n5.nabble.com/Kernel-panic-with-gmultipath-td4204700.html
>
>
> Could this be tuned so that :
> 1) initially, on isp(4) loopdown ->  da(4) devices depending on it return
> SCSI errors, provoking clean failover of gmultipath
> 2) afterwards, on isp(4) timeout ->  da(4) devices are destroyed
>
> Is this a case for using the following boot hints ?
> - "hint.isp.0.loop_down_limit" and "hint.isp.0.gone_device_time" (though
> I am not quite sure what the difference is between the two ... Which one
> does the actual deallocation of underlying devices ?)
>
> Thanks in advance for your time,