HAST + ZFS causes system to shutdown uncleanly?

Fri Mar 18 21:11:20 UTC 2011

On Fri, Mar 18, 2011 at 1:59 PM, Mikolaj Golub <to.my.trociny at gmail.com> wrote:
> On Thu, 17 Mar 2011 13:42:09 -0700 Freddie Cash wrote:
>  FC> On Thu, Mar 17, 2011 at 1:36 PM, Freddie Cash <fjwcash at gmail.com> wrote:
>  >> On Thu, Mar 17, 2011 at 12:32 PM, Thomas Johnson <tom at claimlynx.com> wrote:
>  >>> Has anyone else noticed issues halting a system that is configured with a
>  >>> ZFS filesystem on a HAST device? I am using HAST to replicate a ZFS
>  >>> filesystem between two ESXi virtual machines (trying to emulate our
>  >>> production systems in a test environment) and I've noticed that the system
>  >>> doesn't seem to shutdown completely in this arrangement (hangs after ""
>  >>> message). I did some poking around and learned that if I unmount my zfs
>  >>> filesystems before shutdown, the shutdown finishes cleanly. Muddling my way
>  >>> through the rc scripts, it looks like hastd is killed fairly early on in the
>  >>> shutdown sequence. Presumably this is preventing the system from
>  >>> syncing/unmounting the ZFS mounts, causing the shutdown to hang.
>  >>>
>  >>> Does this seem plausible? If so, any ideas on fix, besides making sure I
>  >>> 'zfs unmount -a' before shutdown?
>  >>
>  >> Does it work if you manually add "hastd" to the REQUIRE: line in /etc/rc.d/zfs?
>  >>
>  >> Of course, that only works if you are starting zfs automatically via
>  >> /etc/rc.conf, and not letting CARP/devd or something else manage the
>  >> pool import process.
>
>  FC> Thinking about it, perhaps we need a hook into the top of the
>  FC> hastd_stop_precmd() function in /etc/rc.d/hastd?
>
>  FC> Something like "hastd_stop_args" in /etc/rc.conf where we can put
>  FC> commands to be run before hastd is stopped?
>
>  FC> Then it would be as simple as putting hastd_stop_args="zfs unmount -a"
>  FC> into /etc/rc.conf.
>
>  FC> Or something along those lines, so that we stop any consumers of the
>  FC> /dev/hast/* devices before we stop the hast daemon.
>
> IMHO, it is not HAST job to bother with such things. We always have something
> (heartbeat, carp, hastmon) to manage HAST (change role, mount fs, start
> applications). This something has it own rc scripts, on startup it sets roles
> and mounts fs (if needed) and on shutdown it should do all necessary cleanup.

Unless I'm missing something here, this has nothing to do with
shutting off the master node in a HAST setup, where the ZFS pool is
mounted, when the slave node is already offline.

As far as CARP, devd, heartbeat, etc are concerned, everything is up
and running correctly.  No need to unmount the pool, as it's not
switching to slave mode.

Or, are you suggesting that part of the "shutdown procedure" would be
to switch it to slave first, then shutdown?

-- 
Freddie Cash
fjwcash at gmail.com