michelle at sorbs.net
Tue Apr 30 13:38:48 UTC 2019
Karl Denninger wrote:
> On 4/30/2019 03:09, Michelle Sullivan wrote:
>> If one triggers such a fault on a production server, how can one justify transferring from backup multiple terabytes (or even petabytes now) of data to repair an unmountable/faulted array.... because all backup solutions I know currently would take days if not weeks to restore the sort of store ZFS is touted with supporting.
> Had it happen on a production server a few years back with ZFS. The
> *hardware* went insane (disk adapter) and scribbled on *all* of the vdevs.
> The machine crashed and would not come back up -- at all. I insist on
> (and had) emergency boot media physically in the box (a USB key) in any
> production machine and it was quite-quickly obvious that all of the
> vdevs were corrupted beyond repair. There was no rational option other
> than to restore.
> It was definitely not a pleasant experience, but this is why when you
> get into systems and data store sizes where it's a five-alarm pain in
> the neck you must figure out some sort of strategy that covers you 99%
> of the time without a large amount of downtime involved, and in the 1%
> case accept said downtime. In this particular circumstance the customer
> didn't want to spend on a doubled-and-transaction-level protected
> on-site (in the same DC) redundancy setup originally so restore, as
> opposed to fail-over/promote and then restore and build a new
> "redundant" box where the old "primary" resided was the most-viable
> option. Time to recover essential functions was ~8 hours (and over 24
> hours for everything to be restored.)
How big was the storage area?
More information about the freebsd-stable