Help me select hardware and software options for very large server

Freddie Cash fjwcash at gmail.com
Sat Jan 24 20:16:13 PST 2009


On Sat, Jan 24, 2009 at 6:21 PM, Terry Kennedy
<terry+freebsd-current at tmk.com> wrote:
>> We did something similar for our off-site, automated backups box.
>>
>> One box has 2x 2 GB CompactFlash in IDE adapters, the other has 2x 2
>> GB USB flash drives.
>
>  I assume this was for FreeBSD itself? I was concerned about write
> cycles (and, to a lesser extent, storage capacity) on CF or USB
> media. I haven't seen much degradation due to seeks when the FreeBSD
> space is a separate logical drive on the AMCC controller. That also
> gets me inherent RAID 6 protection (assuming I carve it from the main
> batch of drives).

Correct.  On the original box, just / is on the CF.  /usr, /var, /tmp,
/home, and a bunch of sub-directories of /usr are ZFS filesystems.  We
also have a /storage directory that we put all the backups in.

One the second box (the offsite replica), / and /usr are on the USB,
with /home, /var, /tmp, /usr/src, /usr/obj, /usr/ports, /usr/local are
all ZFS filesystems.  I put /usr onto the USB as well, as I ran into
an issue with zpool corruption that couldn't be fixed as not enough
apps were available between / and /rescue.  2 GB is still plenty of
space for the OS, and it's not like it will be changing all that much.

>> The drives on each of the RAID controllers are configured as "Single
>> Disk Array", so they appear as 24 separate drives to the OS, but still
>> benefit from the controller's disk cache, management interface, and so
>> on (as compared to JBOD where it acts like nothing more than a SATA
>> controller).
>
> Hmmm. I was planning to use the hardware RAID 6 on the AMCC, for a
> number of reasons: 1) that gives me front-panel indications of broken
> RAID sets, controller-hosted rebuild, and so forth. 2) I'd be using
> fewer ZFS features (basically, just large partitions and snapshots)
> so if anything went wrong, I'd have a larger pool of expertise to draw
> on to fix things (all AMCC users, rather than all FreeBSD ZFS users).
>
>  Did you consider this option and reject it? If so, can you tell me
> why?

Originally, I was going to use hardware RAID6 as well, creating two
arrays, and just joing them together with ZFS.  But then I figured, if
we're going to use ZFS, we may as well use it to the fullest, and use
the built-in raidz features.  In theory, the performance should be
equal or better, due to the CoW feature that eliminates the
"write-hole" that plagues RAID5 and RAID6.  However, we haven't done
any formal benchmarking to see which is actually better:  multiple
hardware RAID arrays added to the pool, or multiple raidz datasets
added to the pool.

>> The drives on one box are configured as 1 large 24-drive raidz2 in ZFS
>> (this box also has 12x 400 GB drives).
>
>> The drives on the other box are configured as 2 separate 11-drive
>> raidz2 arrays, with 2 hot spares.
>
>> The usable space on box boxes is 9 TB.
>
>  So a single ZFS partition of 8TB would be manageable without long
> delays for backup snapshots?

Creating ZFS snapshots is virtually instantaneous.  Destroying ZFS
snapshots takes a long time, depending on the age and size of the
snapshot.  But creating and accessing snapshots is nice and quick.

The really nice thing about ZFS snapshots is that if you set the ZFS
property snapdir to "visible", then you can navigate to
/<zfs-filesystem>/.zfs/snapshot/ and have access to all the snapshots.
 They'll be listed here, by snapshot name.  Just navigate into them as
any other directory, and you have full read-only access to the
snapshot.

>> Other than a bit of kernel tuning back in August/September, these
>> boxes have been running nice and smooth.  Just waiting for either the
>> release of FreeBSD 8.0 or an MFC of ZFS v13 to 7-STABLE to get support
>> for auto-rebuild using hot spares.
>
>  That's good to hear. What sort of tuning was involved (if it is still
> needed)?

Here are the loader.conf settings that we are currently using:
# Kernel tunables to set at boot (mostly for ZFS tuning)
# Disable DMA for the CF disks
# Set kmem to 1.5 GB (the current max on amd64)
# Set ZFS Adaptive Read Cache (arc) to about half of kmem (leaving
half for the OS)
hw.ata.ata_dma=0
kern.hz="100"
vfs.zfs.arc_min="512M"
vfs.zfs.arc_max="512M"
vfs.zfs.prefetch_disable="1"
vfs.zfs.zil_disable="0"
vm.kmem_size="1596M"
vm.kmem_size_max="1596M"

Finding the correct arc_min/arc_max and kmem_size_max settings is a
bit of a black art, and will depend on the workload for the server.
There's a max of 2 GB for kmem_size on FreeBSD 7.x, but the usable max
appears to be around 1596 MB, and will change depending on the server.
 The second box has a max of 1500 MB, for example (won't boot with
anything higher).

Some people run with an arc_max of 64 MB, we ran with it set to 2 GB
for a bit (8 GB of RAM in the box).  Basically, we just tune it down a
little bit everytime we hit a "kmem_map too small" kernel panic.

FreeBSD 8.0 won't have these limitations (kmem_max is 512 GB), and ZFS
v13 will auto-tune itself as much as possible.

-- 
Freddie Cash
fjwcash at gmail.com


More information about the freebsd-current mailing list