iSCSI/ZFS strangeness

Thu Oct 29 02:20:14 UTC 2015

Hi,

I'm experimenting with iSCSI HA with FreeBSD 10.2 amd64. I know people
do this sort of thing, but I can't figure out what I'm missing. (Most
of the tutorials cover HAST instead). I suspect the real problem is
"Lucas doesn't know the right search terms."

The goal is to make an iSCSI-based ZFS pool that's available to two
separate hosts, and remains available even if one of the iSCSI servers
fails. Instead, the pool hangs when either of the iSCSI servers goes
down.

My test environment has two iSCSI servers, iscsi1 and iscsi2. They
each export three drives as a single target.

There's two iSCSI initiators, zfs1 and zfs2. Both of them have active
connections to the iSCSI targets.

On another host I've created a ZFS pool of striped mirrors. Each
mirror has one drive from each iSCSI server.

The initiators can both access the iSCSI-based pool--not
simultaneously, of course. But CARP, devd, and some shell scripting
should get me a highly available pool that can withstand the demise of
any one iSCSI server and any one initiator.

The hope is that the pool would continue to work even if an iSCSI host
shuts down. When the downed iSCSI host returns, the initiators should
log back in and the pool auto-resilver.

Some ten minutes ago, I killed iscsi2. The pool is live on zfs1. And
the drives really have disappeared.

# iscsictl
Target name                          Target portal    State
iqn.2013-11.io.mwl:target0           iscsi2.blackhelicopters.org Operation timed out
iqn.2013-11.io.mwl:target0           iscsi1.blackhelicopters.org Connected: da2 da3 da4

I would expect to see the pool appear degraded. But instead, I have:

# zpool status iscsi
  pool: iscsi
   state: ONLINE
     scan: resilvered 1.16G in 0h3m with 0 errors on Wed Oct 28 14:13:08 2015
     config:

        NAME              STATE     READ WRITE CKSUM
        iscsi             ONLINE       0     0     0
          mirror-0        ONLINE       0     0     0
            gpt/iscsi1-0  ONLINE       0     0     0
            gpt/iscsi2-0  ONLINE       0     0     0
          mirror-1        ONLINE       0     0     0
            gpt/iscsi1-1  ONLINE       0     0     0
	    gpt/iscsi2-1  ONLINE       0     0     0
	  mirror-2        ONLINE       0     0     0
	    gpt/iscsi1-2  ONLINE       0     0     0
	    gpt/iscsi2-2  ONLINE       0     0     0

errors: No known data errors

To try to make ZFS realize the pool is degraded, I write to the iSCSI
pool. (tar -xvpf ports.tar.gz) Each time, the extract gets to a
certain point and hangs. Can't ^C or ^Z out of it.

This latest time, the extract reaches:

x ports/www/firefox-esr/files/patch-media-mtransport-third_party-nICEr-src-util-mbslen.c

I can still SSH into the machine, but if I try to look in any
directories under /iscsi/ports/* my terminal hangs.

So I restart the downed iSCSI server. The initiators log back into the
target.  And the hung tar extract picks up where it left off.

So, I haven't achieved HA. The pool stays up, but it's not exactly
usable.

Any hints on what I'm missing?

Thanks,
==ml

-- 
Michael W. Lucas  -  mwlucas at michaelwlucas.com, Twitter @mwlauthor 
http://www.MichaelWLucas.com/, http://blather.MichaelWLucas.com/