Restructure a ZFS Pool

Raimund Sacherer rs at logitravel.com
Thu Sep 24 13:48:10 UTC 2015


----- Original Message ----- 

> From: "Paul Kraus" <paul at kraus-haus.org>
> To: "Raimund Sacherer" <raimund.sacherer at logitravel.com>, "FreeBSD Questions"
> <freebsd-questions at freebsd.org>
> Sent: Thursday, September 24, 2015 3:31:42 PM
> Subject: Re: Restructure a ZFS Pool

> On Sep 24, 2015, at 8:42, Raimund Sacherer <raimund.sacherer at logitravel.com>
> wrote:

> > I had the pool fill up to over 80%, then I got it back to about 50-60%, but
> > it feels more sluggish. I use a lot of NFS and we use it to backup some 5
> > million files in lots of sub-directorys (a/b/c/d/abcd...), besides other
> > big files (SQL dump backups, bacula, etc.)
> >
> > I said above sluggish because I do not have empirical data and I do not
> > know exactly how to test the system correctly, but I read a lot and there
> > seem to be suggestions that if you have NFS etc. that a independent ZIL
> > helps with copy-on-write fragmentation.

> A SLOG (Separate Log Device) will not remove existing fragmentation, but it
> will help prevent future fragmentation _iff_ (if and only if) the write
> operations are synchronous. NFS is not, by itself, sync, but the write calls
> on the client _may_ be sync.
Yes, I understood that it will only help preventing fragmentation in the future. I also read that performance is great when using async ZFS, would it be safe to use async ZFS if I have Battery Backed Hardware Raid Controller (1024G ram cache)? The server is a HP G8 and I have configured all discs as single disk mirrors (the only way to get a JBOD on this raid controller). In the event of a power outage, everything should be held in the raid controller by the battery and it should write on disk as soon as power is restored, ... would that be safe environment to switch ZFS to async? 

If I use async, is there still the *need* for a SLOG device, I read that running ZFS async and using the SLOG is comparable, because both let the writes be ordered and those prevent fragmentation? It is not a critical system (e.g. downtime during the day is possible), but if restores need to be done I'd rather have it run as fast as possible. 



> > What I would like to know is if I can eliminate one Spare disk from the
> > pool, and add it as a ZIL again, without having to shutdown/reboot the
> > server?

> Yes, but unless you can stand loosing data in flight (writes that the system
> says have been committed but have only made it to the SLOG), you really want
> your SLOG vdev to be a mirror (at least 2 drives).
Shouldn't this scenario be handled by ZFS (writes to SLOG, power out, power on, SLOG is transferred to data disks?)
I thought the only dataloss would be writes which are currently in transit TO the SLOG in time of the power outage?
And I read somewhere that with ZFS since V28 (IIRC) if the SLOG dies it turns off the log and you loose the (performance) benefit of the SLOG, but the pools should still be operational?



> In a zpool of this size, especially a RAIDz<N> zpool, you really want a hot
> spare and a notification mechanism so you can replace a failed drive ASAP.
> The resilver time (to replace afield drive) will be limited by the
> performance of a _single_ drive for _random_ I/O. See this post
> http://pk1048.com/zfs-resilver-observations/ for one of my resilver
> operations and the performance of such.
Thank you for this info, I'l keep it in mind and bookmark your link.



> > Also I would appreciate it if someone has some pointers on how to test
> > correctly so I see if there are real benefits before/after this operation.
> I use a combination of iozone and filebench to test, but first I characterize
> my workload. Once I know what my workload looks like I can adjust the test
> parameters to match the workload. If the test results do not agree with
> observed behavior, then I tune them until they do. Recently I needed to test
> a server before going live. I knew the workload was NFS for storing VM
> images. So I ran iozone with 8-64 GB files and 4 KB to 1 MB blocks, and sync
> writes (the -o option). The measurements matched very closely to the
> observations, so I knew could trust them and any changes I made would give
> me valid results.

Thank you, I will have a look on iozone


best
Ray


More information about the freebsd-questions mailing list