some ZFS questions

Wed Aug 6 22:26:55 UTC 2014

On Aug 6, 2014, at 3:32, Scott Bennett <bennett at sdf.org> wrote:

> 	2) How does one start or stop a pool?

I assume your question comes from other Volume Managers that need to have a process (or kernel thread) running to manage the volumes. ZFS does not really work that way (and at the same time it does).

>  From what I've read, it
> 	appears that ZFS automatically starts all the pools at once.

The system will keep track of which zpools were active on that system and automatically import them at boot time. ZFS records in the zpool which host has last imported it to prevent automatically importing the same pool on multiple systems at once.

>  If
> 	there is a problem after a crash that causes ZFS to decide to
> 	run some sort of repairs without waiting for a go-ahead from a
> 	human, ZFS might create still more problems.

Not likely. The “repairs” you speak of consist of two different mechanisms.

1. ZFS is transactional, so if a change has been committed to the transaction log (know as transaction groups of TXG) but not marked as committed, then at import time the TXG log will be played (re-played) to insure that the data is as up to date as possible. Because ZFS changes is Copy on Write and the changes are applied atomically the actual data is always consistent, hence no need for an fsck-like utility.

2. If a device that makes up a zpool is missing (failed) or otherwise unavailable *and* a hot spare is available, then ZFS will start resilvering (the ZFS term for a sync-like operation) the new device to substitute for the missing (failed) device. The resilver operation is handled at a lower priority than real I/O so it has little impact to operations.

>  For example, if
> 	a set of identically partitioned drives has a pool made of one
> 	partition from each drive and another pool made from a different
> 	set of partitions,

Not an advised configuration, but a permitted one (yes, I have done this).

> a rebuild after a failed/corrupted drive might
> 	start on both pools at once, thereby hammering all of the drives
> 	mercilessly until something else, hardware or software, failed.

Yup, but just using I/O bandwidth that is not already being used for production I/O. But, yes, the drives will be seeing the maximum amount of random I/O that they can sustain.

> 	Having a way to allow one to complete before starting another
> 	would be critical in such a configuration.

Avoid such configurations.

>  Also, one might need
> 	stop a pool in order to switch hardware connections around.

zpool export <zpool name> or zpool export -f <zpool name> if necessary. Yes, you can do this while a resilver is running. It will start again (depending on specific ZFS code, maybe at the point it left off) when the zpool is next imported.

>  I
> 	see the zpool(8) command has a "reopen" command, but I don't see
> 	a "close" counterpart, nor a description of when a "reopen" might
> 	be used.

I think you are looking for the zpool import and zpool export commands here.

> 
> 	3) If a raidz2 or raidz3 loses more than one component, does one
> 	simply replace and rebuild all of them at once?  Or is it necessary
> 	to rebuild them serially?  In some particular order?

I do not believe that you can replace more than one device at a time, but if you issue a zpool replace <zpool name> <old device> <new device> command while a resilver is running I believe that it will just re-start the resilver writing data to *both* new devices at once. Note that since you can have multiple top level vdevs, and each vdev can be a RAIDz<n>, this is *not* as ludicrous as might seem at first glance. The resilver is really happening within a top level vdev.

No need to replace failed devices in any particular order, unless your specific configuration depends on it. You might have two failing devices and one is much worse than the other. I would replace the device with the more serious errors first, but you may have a reason to choose otherwise.

> 	4) At present, I'm running 9-STABLE i386.  The box has 4 GB of
> 	memory, but the kernel ignores a bit over 1 GB of it.

I would NOT run ZFS on a 32-bit system.

<snip>

> 	5) When I upgrade to amd64, the usage would continue to be low-
> 	intensity as defined above.  Will the 4 GB be enough?

ZFS uses a memory structure called the ARC (Adaptize Reuse Cache) and it is the key to any kind of performance out of ZFS. It is both a write cache and a read (and read ahead) cache. If it is not large enough (compared to the amount of data you will be writing in any 30 second period of time) then you will be in serious trouble. My rule of thumb is to not use ZFS on systems (real or virtual) with less than 4GB RAM. I have been running 9.2 on a systems with 8GB RAM with no issues, but when I was testing 10.0 with 3GB RAM I occasionally had memory related hangs (I was testing with iozone before my additional RMA arrived).

>  I will not
> 	be using the "deduplication" feature at all.

The reduplication in ZFS has a very small “sweet spot” and it is highly recommended that you run the deduce test before turning on deduce to see the real effect it has (I am not near my systems right now or I would include the specific zfs command). Also note that 1GB RAM per 1TB of raw space under deduce is functionally mandatory for a functional system. 

> 	6) I have a much fancier computer sitting unused that I intend to
> 	put into service fairly soon after getting my current disk and data
> 	situation resolved.  The drives that would be in use for raidz
> 	pools I would like to attach to that system when it is ready.  It
> 	also has 4 GB of memory, but would start out as an amd64 system and
> 	might well have another 2 GB or 4 GB added at some point(s), though
> 	not immediately.  What problems/pitfalls/precautions would I need
> 	to have in mind and be prepared for in order to move those drives
> 	from the current system to that newer one?

You should be able to physically move the drives from *any* system to *any* other that supports the ZFS version and features that you are running (using). ZFS was even designed to even handle endien differences (SPARC to INTEL for example). I would caution you you to EXPORT the zpool when removing the drives and IMPORT it fresh on the new system. Technically you *can* do a `zpool import -f`, but from years of reading horror stories on the ZFS list, I *always* export / import if moving drives (if I can).

--
Paul Kraus
paul at kraus-haus.org