HAST + ZFS + NFS + CARP

Thu Aug 11 08:16:44 UTC 2016

> On 04 Jul 2016, at 21:31, Julien Cigar <julien at perdition.city> wrote:
> 
>> To get specific again, I am not sure I would do what you are contemplating given your circumstances since it’s not the cheapest / simplest solution.  The cheapest / simplest solution would be to create 2 small ZFS servers and simply do zfs snapshot replication between them at periodic intervals, so you have a backup copy of the data for maximum safety as well as a physically separate server in case one goes down hard.  Disk storage is the cheap part now, particularly if you have data redundancy and can therefore use inexpensive disks, and ZFS replication is certainly “good enough” for disaster recovery.  As others have said, adding additional layers will only increase the overall fragility of the solution, and “fragile” is kind of the last thing you need when you’re frantically trying to deal with a server that has gone down for what could be any number of reasons.
>> 
>> I, for example, use a pair of FreeNAS Minis at home to store all my media and they work fine at minimal cost.  I use one as the primary server that talks to all of the VMWare / Plex / iTunes server applications (and serves as a backup device for all my iDevices) and it replicates the entire pool to another secondary server that can be pushed into service as the primary if the first one loses a power supply / catches fire / loses more than 1 drive at a time / etc.  Since I have a backup, I can also just use RAIDZ1 for the 4x4Tb drive configuration on the primary and get a good storage / redundancy ratio (I can lose a single drive without data loss but am also not wasting a lot of storage on parity).
> 
> You're right, I'll definitively reconsider the zfs send / zfs receive
> approach.

Sorry to be so late to the party.

Unless you have a *hard* requirement for synchronous replication, I would avoid it like the plague. Synchronous replication sounds sexy, but it
has several disadvantages: Complexity and in case you wish to keep an off-site replica it will definitely impact performance. Distance will
increase delay.

Asynchronous replication with ZFS has several advantages, however.

First and foremost: the snapshot-replicate approach is a terrific short-term “backup” solution that will allow you to recover quickly from some
often too quickly incidents, like your own software corrupting data. A ZFS snapshot is trivial to roll back and it won’t involve a costly “backup
recovery” procedure. You can do both replication *and* keep some snapshot retention policy àla Apple’s Time Machine. 

Second: I mentioned distance when keeping off-site replicas, as distance necessarily increases delay. Asynchronous replication doesn´t have that problem.

Third: With some care you can do a one to N replication, even involving different replication frequencies.

Several years ago, in 2009 I think, I set up a system that worked quite well. It was based on NFS and ZFS. The requirements were a bit particular,
which in this case greatly simplified it for me.

I had a farm of front-end web servers (running Apache) that took all of the content from a NFS server. The NFS server used ZFS as the file system. This might not be useful for everyone, but in this case the web servers were CPU bound due to plenty of PHP crap. As the front ends weren’t supposed to write to the file server (and indeed it was undesirable for security reasons) I could afford to export the NFS file systems in read-only mode. 

The server was replicated to a sibling in 1 or 2 minute intervals, I don’t remember. And the interesting part was this. I used Heartbeat to decide which of the servers was the master. When Heartbeat decided which one was the master, a specific IP address was assigned to it, starting the NFS service. So, the front-ends would happily mount it.

What happened in case of a server failure? 

Heartbeat would detect it in a minute more or less. Assuming a master failure, the former slave would become master, assigning itself the NFS
server IP address and starting up NFS. Meanwhile, the front-ends had a silly script running in 1 minute intervals that simply read a file from the
NFS mounted filesystem. In case there was a reading error it would force an unmount of the NFS server and it would enter a loop trying to mount it again until it succeeded.

It looks kludgy, but that means that in case of a server loss (ZFS on FreeBSD wasn’t that stable at the time and we suffered a couple of them) the website was titsup for maybe two minutes, recovering automatically. It worked. 

Both NFS servers were in the same datacenter, but I could have added geographical dispersion by using BGP to announce the NFS IP address to our routers. 

There are better solutions, but this one involved no fancy software licenses, no expensive hardware and it was quite reliable. The only problem we had was, maybe I was just too daring, we were bitten by a ZFS deadlock bug several times. But it worked anyway.

Borja.