ZFS
Freddie Cash
fjwcash at gmail.com
Thu Oct 30 15:34:55 PDT 2008
On October 30, 2008 02:55 pm Lorenzo Perone wrote:
> On 22.10.2008, at 17:38, Freddie Cash wrote:
> > Personally, we use it in production for a remote backup box using
> > ZFS and Rsync (64-bit FreeBSD 7-Stable from August, 2x dual-core
> > Opteron 2200s, 8 GB DDR2 RAM, 24x 500 GB SATA disks attached to two
> > 3Ware 9650/9550 controllers as single-disks). Works beautifully,
> > backing up 80 FreeBSD and Debian Linux servers every night, creating
> > snapshots with each run.
> > Restoring files from an arbitrary day is as simple as navigating to
> > the needed .zfs/snapshot/<snapname>/<path>/ and scping the file to
> > wherever.
> > And full system restores are as simple as "boot livecd, partition/
> > format disks, run rsync".
> So your system doesn't suffer panics and/or deadlocks, or you just
> cope with them as "collateral damage" (which, admitted, is less of
> a problem with a logging fs)?
Back in August, when we first started the implementation, we deadlocked it
once or twice a week. By the time it went live in September, we
deadlocked it several times a week, but that turned out to be due to the
CPU/RAM in the production machine not being able to keep up with even 20
rsync runs (2x Opeteron 200, 8 GB DDR1-SDRAM).
We moved the harddrives over to another system with the Opteron 2200s
(similar to the testing machine) and DDR2-SDRAM and have only deadlocked
it 2x in 6 weeks.
Through all that, we noticed a pattern: if we had more than 4 rsyncs
running that were doing straight copies (ie added new servers to the
backup, and this was their first run), then the server would deadlock.
Had to be power-cycled.
But if the rsyncs are doing mostly incremental updates (file compares,
updating changed files, writing new files), then we can run with all 80
without issues.
So, we've taken to only adding 1 new server at a time to the backup
process, and waiting for it to fully sync to the backup server before
adding the next one. (We stop the rsync run at 6:50am, so some servers
can take up to three days for the initial sync to complete, as the remote
end only has 768 Kbps ADSL upload speeds.)
It's only been up for a week now since the last deadlock, but now that
we've discovered the issue (too many writes from too many rsyncs
simultaneously), we think it will be a lot longer until the next one. :)
We're anxiously awaiting the release of 8.0, with the much expanded
kmem_max, so we can put 16 GB of RAM in here, give 4 GB to the ARC, and
give the rest to rsync, which should speed things up and stabilise it
more.
> If that's the case, would you share the details about what you're using
> on that machine (RELENG_7?, 7_0? HEAD?) and which patches
> /knobs You used? I have a similar setup on a host which
> backs up way fewer machines and locks up every... 3-9 weeks or so.
> That host only has about 2GB ram though.
All the gory details follow. It's fairly long. There are no custom or
extra patches installed.
Hardware:
Tyan h2000M motherboard (S3992)
2x Opteron 2200 CPUs (dual-core) @ 2 GHz
4x 2 GB DDR2-667 ECC SDRAM
3Ware 9550SXU-ML16 PCI-X RAID controller
3Ware 9650SE-ML12 PCIe RAID controller
12x 400 GB Seagate SATA harddrives
12x 500 GB WD SATA harddrives
2x 2 GB CompactFlash cards in CF-to-IDE adapters
Chenbro 5U case with 4-way redundand PSUs and 24x hot-swappable bays
All 24 harddrives are configured as "SingleDisk" arrays, which makes them
appear as individual, normal drives to the OS, but allows the RAID
controller to use the disk write cache and the card write cache.
ad0 and ad1 are the CF cards, and are part of a gmirror (gm0)
da0 through da23 are part of a raidz2 pool called "storage"
/ is on gm0
The following are ZFS filesystems:
/usr
/usr/src compressed (ljz)
/usr/obj
/usr/ports compressed (ljz)
/usr/ports/distfiles
/usr/local
/home
/tmp
/var
/storage
/storage/backup compressed (gzip)
swap is an 8 GB zvol
ZFS recordsize is set to 64K on storage, and inherited by the rest.
uname -a:
FreeBSD megadrive.sd73.bc.ca 7.0-STABLE FreeBSD 7.0-STABLE #0: Tue Aug 19
10:39:29 PDT 2008
root at megadrive.sd73.bc.ca:/usr/obj/usr/src/sys/ZFSHOST amd64
/boot/loader.conf:
# Loader options
autoboot_delay="10"
beastie_disable="NO"
loader_logo="beastie"
module_path="/boot/kernel"
# Kernel modules to load at boot
zfs_load="YES"
# Kernel tunables to set at boot (mostly for ZFS tuning)
# Disable DMA for the CF disks
# Set kmem to 1.5 GB (the current max on amd64)
# Set ZFS Adaptive Read Cache (arc) to about half of kmem (leaving half
for the OS)
hw.ata.ata_dma=0
kern.hz="100"
vfs.zfs.arc_min="512M"
vfs.zfs.arc_max="768M"
vfs.zfs.prefetch_disable="1"
vfs.zfs.zil_disable="0"
vm.kmem_size="1596M"
vm.kmem_size_max="1596M"
# Devices to disable at boot (mainly ISA/non-PnP devices)
hint.fd.1.disabled="1"
hint.sio.0.disabled="1"
hint.sio.1.disabled="1"
hint.sio.2.disabled="1"
hint.sio.3.disabled="1"
hint.ppc.0.disabled="1"
Kernel config is GENERIC minus a bunch of unneeded drivers, using
SCHED_ULE.
/etc/sysctl.conf:
# General network settings
net.isr.direct=1 # Whether to enable Direct Dispatch for netisr
# IP options
net.inet.ip.forwarding=0 # Whether to enable packet forwarding
net.inet.ip.process_options=0 # Disable processing of IP options
net.inet.ip.random_id=1 # Randomise the IP header ID number
net.inet.ip.redirect=0 # Whether to allow redirect packets
#net.inet.ip.stealth=0 # Whether to appear in traceroute output
# ICMP options
net.inet.icmp.icmplim=200 # Limit ICMP packets to this many/s
net.inet.icmp.drop_redirect=1 # Drop ICMP redirect packets
net.inet.icmp.log_redirect=0 # Don't log ICMP redirect packets
# TCP options
net.inet.tcp.blackhole=1 # Drop packets destined to unused ports
net.inet.tcp.inflight.enable=1 # Use automatic TCP window-scaling
net.inet.tcp.log_in_vain=0 # Don't log the blackholed packets
net.inet.tcp.path_mtu_discovery=1 # Use ICMP type 3 to find the MTU to use
net.inet.tcp.recvspace=131072 # Size in bytes of the receive buffer
net.inet.tcp.sack.enable=1 # Enable Selective ACKs
net.inet.tcp.sendspace=131072 # Size in bytes of the send buffer
net.inet.tcp.syncookies=1 # Enable SYN cookie protection
# UDP options
net.inet.udp.blackhole=1 # Drop packets destined to unused ports
net.inet.udp.checksum=1 # Enable UDP checksums
net.inet.udp.log_in_vain=0 # Don't log the blackholed packets
net.inet.udp.recvspace=65536 # Size in bytes of the receive buffer
# Debug options
debug.minidump=1 # Enable the small kernel core dump
debug.mpsafevfs=1 # Enable threaded VFS subsystem
# Kernel options
kern.coredump=0 # Disable kernel core dumps
kern.ipc.somaxconn=512 # Expand the IP listen queue
kern.maxvnodes=250000 # Bump up the max number of vnodes
# PCI bus options
hw.pci.enable_msix=1 # Enable Message Signalled Interrupts Extended
hw.pci.enable_msi=1 # Enable Message Signalled Interrupts
hw.pci.enable_io_modes=1 # Enable alternate I/O access modes
# Other options
vfs.usermount=1 # Enable non-root users to mount filesystems
--
Freddie Cash
fjwcash at gmail.com
More information about the freebsd-stable
mailing list