some ZFS questions

Sun Aug 24 10:15:36 UTC 2014

Paul Kraus <paul at kraus-haus.org> wrote:

> On Aug 21, 2014, at 6:07, Scott Bennett <bennett at sdf.org> wrote:
>
> > Paul Kraus <paul at kraus-haus.org> wrote:
>
> >> If I had two (or more) vdevs on each device (and I *have* done that when I needed to), I would have issued the first zpool replace command, waited for it to complete and then issued the other. If I had more than one drive fail, I would have handled the replacement of BOTH drives on one zpool first and then moved on to the second. This is NOT because I want to be nice and easy on my drives :-), it is simply because I expect that running the two operations in parallel will be slower than running them in series. For the major reason that large seeks are slower than short seeks.
> > 
> >     My concern was over a slightly different possible case, namely, a hard
> > failure of a component drive (e.g., makes ugly noises, doesn't spin, and/or
> > doesn't show up as a device recognized as such by the OS).  In that case,
> > either one has to physically connect a replacement device or a spare is
> > already on-line.  A spare would automatically be grabbed by a pool for
> > reconstruction, so I wanted to know whether situation under discussion would
> > result in automatically initiated rebuilds of both pools at once.
>
> If the hot spare devices are listed in each zpool, then yes, they would come online as the zpools have a device fail. But ? it is possible to have one zpool with a failed vdev and the other not. It depends on the device?s failure mode. Bad blocks can cause ZFS to mark a vdev bad and need replacement while a vdev also on that physical device may not have bad blocks. In the case of a complete failure both zpools would start resilvering *if* the hot spares were listed in both. The way around this would be to have the spare device ready but NOT list it in the lower priority zpool. After the first resilver completes manually do the zpool replace on the second.

     Okay.  That seems like a reasonable way to handle it.     
>
> >> A zpool replace is not a simple copy from the failing device to the new one, it is a rebuild of the data on the new device, so if the device fails completely it just keeps rebuilding. The example in my blog was of a drive that just went offline with no warning. I put the new drive in the same physical slot (I did not have any open slots) and issued the resilver command.
> > 
> >     Okay.  However, now you bring up another possible pitfall.  Are ZFS's
> > drives address- or name-dependent?  All of the drives I expect to use will be
> > external drives.  At least four of the six will be connected via USB 3.0.  The
> > other two may be connected via USB 3.0, Firewire 400, or eSATA.  In any case,
> > their device names in /dev will most likely change from one boot to another.
>
> ZFS uses the header written to the device to identify it. Note that this was not always the case and *if* you have a zfs cache file you *may* run into device renaming issues. I have not seen any, but I am also particularly paranoid about not moving devices around before exporting them. I have seen too many stories of lost zpools due to this many years ago on the ZFS list.

     Well, I don't have an SSD at present, so maybe it won't matter then.
Nevertheless, if a crash and reboot can result in loss of a pool because
the device names get reshuffled, that would seem like a real hazard to
using ZFS, so I hope that is no longer the case.
>
> >> Tune vfs.zfs.arc_max in /boot/loader.conf
> > 
> >     That looks like a huge help.  While initially loading a file system
> > or zvol, would there be any advantage to setting primarycache to "metadata",
> > as opposed to leaving it set to the default value of "all??
>
> I do not know, but I?m sure the folks on the ZFS list who know much more than I do will have opinions :-)
>
> >> If I had less than 4GB of RAM I would limit the ARC to 1/2 RAM, unless this were solely a fileserver, then I would watch how much memory I needed outside ZFS and set the ARC to slightly less than that. Take a look at the recommendations here https://wiki.freebsd.org/ZFSTuningGuidefor low RAM situations.
> > 
> >     Will do.  Hmm...I see again the recommendation to increase KVA_PAGES
> > from 260 to 512.  I worry about that because the i386 kernel says at boot
> > that it ignores all real memory above ~2.9 GB.  A bit farther along, during
> > the early messages preserved and available via dmesg(1), it says,
> > 
> > real memory  = 4294967296 (4096 MB)
> > avail memory = 3132100608 (2987 MB)
>
> On the FreeBSD VM I am running with only 1 GB memory I did not do any tuning and ZFS seems to be working fine. It is a mail store, but not a very large one (only about 130 GB of email). Performance is not a consideration on this VM, it is archival.

     So you haven't changed KVA_PAGES in your kernel?

                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************