ZFS v28 for 8.2-STABLE

Martin Matuska mm at FreeBSD.org
Sun May 1 00:09:22 UTC 2011


We plan to MFC v28.

But as this change is quite intrusive to the users, there is no way back
if you upgrade your pool (not upgrading bootcode = not able to boot =
saved by mfsBSD). It will happen when we think it is stable enough to be
in STABLE.

As of me, I am not using it in serious production yet (I am very happy
with v15 + latest patches), but my development servers with v28 seem
pretty stable.

I have updated patch to reflect latest changes (grab latest one):
http://people.freebsd.org/~mm/patches/zfs/v28/

As to your setup, have you tried using a partition as a log device?

File-based devices are generally considered experimental in all ZFS
implementations (including Solaris).

Dňa 30.04.2011 17:44, Pierre Lamy  wrote / napísal(a):
> On 4/29/2011 8:15 PM, Jeremy Chadwick wrote:
>> On Fri, Apr 29, 2011 at 11:20:21PM +0300, Volodymyr Kostyrko wrote:
>>> 28.04.2011 07:37, Ruslan Yakovlev wrote:
>>>> Does actually patch exist for 8.2-STABLE ?
>>>> I probe
>>>> http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20110317.patch.xz
>>>>
>>>>
>>>> Building failed with:
>>>> can't cd to /usr/src/cddl/usr.bin/zstreamdump
>>>> Also sys/cddl/compat/opensolaris/sys/sysmacros.h failed to patch.
>>>>
>>>> Current FreeBSD 8.2-STABLE #35 Mon Apr 18 03:40:38 EEST 2011 i386
>>>> periodically frozen on high load like backup by rsync or find -sx ...
>>>> (from default cron tasks).
>>> Well ZFSv28 should be very close to STABLE for now?
>>>
>>> http://lists.freebsd.org/pipermail/freebsd-current/2011-February/023152.html
>>>
>> It's now a matter of opinion.  The whole idea of ZFSv28 being committed
>> to HEAD was to be tested.  I haven't seen any indication of a progress
>> report provided for anything on HEAD that pertains to ZFSv28, have you?
>>
>> Furthermore, the FreeBSD Quarterly Status Report just came out on 04/27
>> for the months of January-March (almost a 2 month delay, sigh):
>>
>> 1737     04/27 10:58  Daniel Gerzo        ( 41K) FreeBSD Status Report
>> January-March, 2011
>>
>> http://www.freebsd.org/news/status/report-2011-01-2011-03.html
>>
>> Which states that ZFSv28 is "now available in CURRENT", which we've
>> known for months:
>>
>> http://www.freebsd.org/news/status/report-2011-01-2011-03.html#ZFSv28-available-in-FreeBSD-9-CURRENT
>>
>>
>> But again, no progress report, so nobody except those who follow
>> HEAD/CURRENT know what the progress is.  And that progress has not been
>> relayed to any of the non-HEAD/CURRENT lists.
>>
>> I'm a total hard-ass about this stuff, and have been for years, because
>> it all boils down to communication (or lack there-of).  It seems very
>> hasty to say "Yeah! MFC this!" when we (folks who only follow STABLE)
>> have absolutely no idea if what's in CURRENT is actually broken in some
>> way or if there are outstanding problems -- and if there are, what those
>> are so users can be aware of them in advance.
>>
> 
> Hello,
> 
> Here's a summary of my recent end-user work with ZFS on -current. I
> recently was lucky enough to purchase 2 NAS systems, which consist of 2
> cheap new PCs loaded with 6 HD, one is a simple gpt boot device 1x 1tb
> and 5x 2tb data drives. The mobo has 6 sata connectors but I needed to
> purchase an additional PCI-E sata adapter since the DVD also uses a sata
> port. The system has 4gb memory and a new inexpensive quad core AMD CPU.
> 
> I've been running it (recent -current) for a couple of weeks with heavy
> single-user use. 2.5tb/7.1tb.
> 
> The only problem I found, was that deleting a file-backed log device
> from a degraded pool would immediately panic the system. I'm not running
> stock -current so I didn't report it.
> 
> Resilvering seems absurdly slow, but since I won't be doing it much also
> didn't care. My NAS is side by side redundant, so if resilvering takes
> more than 2 days I would just replicate off of my other NAS.
> 
> Throughput without a log device was in the range of 30mb/sec (3% of my
> 1gb interface). Adding a file-backed log device on a UFS partition that
> is used for boot, resulted in a 10x jump, saturating the SATA bus that I
> was sending data from over the network. It spiked up to 30% of interface
> throughput/max bus speed for disk, and did not vary much. This resolved
> the issues I saw that a lot of other people have posted about on the
> internet, about very spiky data transfers. I first used a 40mb/sec
> throughput USB device as the log device, which showed a dramatic
> smoothness in data transfer, but still had ~15 seconds where no data
> would xfer, while it was flushed from USB to disk. After researching I
> discovered that I could use a file backed log device and this fixed all
> the problems about spiky data transfers.
> 
> Before that I had tuned the sysctl's as the poor out of the box settings
> were giving me very slow speeds (in the range of 1% network throughput,
> before log device). I played around with the vfs.zfs tunables but found
> that I did not need to after I added the log device, and the out of the
> box settings for that sysctl tree were just fine.
> 
> I had first set this up before CAM was added to -current as default, and
> did not use labels. Due to troubleshooting some unrelated disk issues, I
> ended up switching to CAM without problems, and subsequently labeled the
> disks (recreated the zpool after the labeling). I am now using CAM and
> AHCI without any issues.
> 
> Here are some personal notes about the tunables I set, I am sure they
> are not all helpful. I didn't add them one by one, I simply mass changed
> them and saw a positive result. Also noted are the commands I used and
> current system status.
> 
> sysctl -w net.inet.tcp.sendspace=373760
> sysctl -w net.inet.tcp.recvspace=373760
> sysctl -w net.local.stream.sendspace=82320
> sysctl -w net.local.stream.recvspace=82320
> sysctl -w vfs.zfs.prefetch_disable=1
> sysctl -w net.local.stream.recvspace=373760
> sysctl -w net.local.stream.sendspace=373760
> sysctl -w net.local.inflight=1
> sysctl -w net.inet.tcp.ecn.enable=1
> sysctl -w net.inet.flowtable.enable=0
> sysctl -w net.raw.recvspace=373760
> sysctl -w net.raw.sendspace=373760
> sysctl -w net.inet.tcp.local_slowstart_flightsize=10
> sysctl -a net.inet.tcp.delayed_ack=0
> sysctl -w kern.maxvnodes=600000
> sysctl -w net.local.dgram.recvspace=8192
> sysctl -w net.local.dgram.maxdgram=8192
> sysctl -w net.inet.tcp.slowstart_flightsize=10
> sysctl -w net.inet.tcp.path_mtu_discovery=0
> 
> <root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada0 /dev/ada0
> <root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada1 /dev/ada1
> <root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada3 /dev/ada3
> <root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada4 /dev/ada4
> <root.wheel at zfs-slave> [/var/preserve/root] # glabel label g_ada5 /dev/ada5
> 
> Labels so that later I will be able to more easily identify disks. My
> mobo has a single ata bus slave port for SATA. That disk would
> "disappear" from the box. Moving the drive to a master sata port
> resolved the issue (? very odd).
> 
> gnop create -S 4096 /dev/label/g_ada0
> mkdir /var/preserve/zfs
> dd if=/dev/zero of=/var/preserve/zfs/log_device bs=1m count=5000
>  zpool create -f tank raidz /dev/label/g_ada0.nop /dev/label/g_ada1
> /dev/label/g_ada3 /dev/label/g_ada4 /dev/label/g_ada5 log
> /var/preserve/zfs/log_device
> 
> The 4 above lines are to set the alignment to 4kb, to create a file
> backed log device, and create the pool.
> 
> zfs set atime=off tank
> 
> I decided not to use dedup, because my files don't have a lot of dup.
> They're mostly large media files, ISOs etc.
> 
> <root.wheel at zfs-slave> [/var/preserve/root] # zpool status
>   pool: tank
>  state: ONLINE
>  scan: none requested
> config:
> 
>         NAME                            STATE     READ WRITE CKSUM
>         tank                            ONLINE       0     0     0
>           raidz1-0                      ONLINE       0     0     0
>             label/g_ada0                ONLINE       0     0     0
>             label/g_ada1                ONLINE       0     0     0
>             label/g_ada3                ONLINE       0     0     0
>             label/g_ada4                ONLINE       0     0     0
>             label/g_ada5                ONLINE       0     0     0
>         logs
>           /var/preserve/zfs/log_device  ONLINE       0     0     0
> 
> errors: No known data errors
> <root.wheel at zfs-slave> [/var/preserve/root] #
> 
> <root.wheel at zfs-slave> [/var/preserve/root] # df
> Filesystem          Size    Used   Avail Capacity  Mounted on
> /dev/gpt/pyros-a    9.7G    3.3G    5.6G    37%    /
> /dev/gpt/pyros-c    884G    6.1G    808G     1%    /var
> tank                7.1T    2.5T    4.6T    35%    /tank
> <root.wheel at zfs-slave> [/var/preserve/root] #
> 
> 
> ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
> ada0: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada0: 150.000MB/s transfers (SATA 1.x, UDMA6, PIO 8192bytes)
> ada0: Command Queueing enabled
> ada0: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada1 at ahcich2 bus 0 scbus3 target 0 lun 0
> ada1: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada1: Command Queueing enabled
> ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada2 at ahcich3 bus 0 scbus4 target 0 lun 0
> ada2: <ST31000520AS CC32> ATA-8 SATA 2.x device
> ada2: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada2: Command Queueing enabled
> ada2: 953869MB (1953525168 512 byte sectors: 16H 63S/T 16383C)
> ada3 at ahcich4 bus 0 scbus5 target 0 lun 0
> ada3: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada3: Command Queueing enabled
> ada3: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada4 at ahcich5 bus 0 scbus6 target 0 lun 0
> ada4: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
> ada4: Command Queueing enabled
> ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> ada5 at ata1 bus 0 scbus8 target 0 lun 0
> ada5: <ST2000DL003-9VT166 CC32> ATA-8 SATA 3.x device
> ada5: 150.000MB/s transfers (SATA, UDMA6, PIO 8192bytes)
> ada5: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
> 
> CPU: AMD Phenom(tm) II X4 920 Processor (2800.19-MHz K8-class CPU)
> ...
> real memory  = 4294967296 (4096 MB)
> avail memory = 3840598016 (3662 MB)
> 
> ZFS filesystem version 5
> ZFS storage pool version 28
> 
> 
> Best practices:
> 
> Tune the sysctls related to buffer sizes / queue depth.
> Label your disks before you build the zpool.
> Use gnop to 4kb align the disks. Only one disk in the pool needs this
> before you create it.
> Use CAM.
> *** USE A LOG DEVICE! ***
> 
> -Pierre
> 
> 
> 
> 
> 
> 
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"


More information about the freebsd-fs mailing list