Re: ZFS on a shared iSCSI

From: Pete Wright <pete_at_nomadlogic.org>
Date: Wed, 07 Feb 2024 20:35:23 UTC
On 2/7/24 02:55, Andrea Brancatelli wrote:
> Hello guys, I'm not 100% this is the correct list to ask this 
> question, if not please feel free to point me in the right direction.
>
> I was wondering what could be the best recipe to have an HA cluster 
> sharing an external ZFS storage.
>
> Let's say I have two servers running a bunch of Jails and, thus, I'd 
> like to use ZFS as the underlying storage layer and I have an external 
> (iSCSI) storage connected.
>
> Would it be "easily possible" to have some (2?) iSCSI LUN exposed to 
> both servers and then activate the pool on one or the other server?
>
> The idea would be to reactivate the filesystem from server A on server 
> B if the server A fails.
>
> Would it be "easier" to replicate everything and zfs send datas back 
> and forth? Clearly that would mean doubling datas and havin a 
> scheduled replica with a possible delay in data replication, so I'd 
> like to avoid this.
>
You could probably roll your own solution using corosync and pacemaker, 
possibly in addition to using HAST to replicate blocks between your 
LUNs.  i would avoid trying to do ZFS replication in this scenario.


the tl;dr could look like:

- HAST replicates blocks between iSCSI LUNs (assuming your vendor 
doesn't already support this on the target side, many of the enterprise 
vendors should provide this for you IMHO).

- corosync/pacemaker are used to detect health of each of your freebsd 
systems.  if a  heartbeat fails between nodes it can trigger a failover 
event automatically.

- the failover event would mount the LUN on the healthy box and do other 
housekeeping (failing over IPs maybe?, restarting jails?)

i've actually build a system using corosync to do failover in AWS, and 
one of the nice things with it is when a failover event is triggered you 
can run arbitrary scripts.  so in my use case i was able to interact 
with the AWS EC2 API via some scripts to migrate network devices from 
one instance to another.  it seems pretty reliable, and handles some 
critical infrastructure for us.

but to get this all right is pretty complicated...but so is distributed 
computing in general and i'd be suspicious of any vendor who says they 
can make this simple :)

regardless of your approach you'd need to do a lot of testing and 
monitoring for critical production use.  it all comes down to what 
amount of resources you want to put into this.

-pete


-- 
Pete Wright
pete@nomadlogic.org