RHEL to FreeBSD file server
Jason Keltz
jas at cse.yorku.ca
Mon Nov 12 17:24:10 UTC 2012
For the last few months, I've been working on and off learning about
FreeBSD. The goal of my work is to swap out our current dual Red Hat
Enterprise Linux file servers with FreeBSD. I'm ultimately hoping for
the most reliable, high performance NFS file server that I can get. The
fact that, in addition, I get to take advantage of ZFS is what I see as
a a major bonus.
I only recently (thanks Rick!) became aware of this mailing list, and
after reading a few months worth of postings, I'm a little "nervous"
about stability of ZFS in FreeBSD, though I can see that many issues are
tied to specific combinations of FreeBSD versions, driver versions,
specific HBAs, etc. In addition, I'm a little bit concerned reading
about perceived performance issues with the NFS implementation, yet,
only recently, there was a new NFS implementation in FreeBSD. That
being said, I've learned that in general, people don't often go posting
about experiences that work well, so I'm trying to stay positive, and
hoping that my plan is still for the best. I'm hoping to share some
information about what I've done with the old file servers, and what I
intend to do with the new one, and get some feedback from list members
to see if I'm heading in the right direction. Even if you can't
comment on anything I'm about to write about, but you can tell me about
a positive experience you have running FreeBSD as an NFS file server
with ZFS, that would be great!!
My present (2008) file servers both contain LSI/3ware RAID controller
cards, and several RAID units with disks arranged in RAID10
configuration. There are a total of about 1600 mounts across both
servers. Home directories are "split" between the servers, but only on
two ext3 filesystems. We are using NFSv3 at the moment, and because I
don't use Kerberos, I run NFS over OpenVPN, mostly to protect the
connection (though we use cipher none for performance). For cost
effectiveness, we have a "manual failover" solution. That is, either
file server has enough disk slots to "take over" for the other
server. If a server is taken down, I can take the disks out of either
server, place them in the other server, turn it on, and through
scripting, either server can take over the IP/name/disks from the other
server, and all the NFS clients resume as if both servers are running.
It's not ideal, but I'll tell you - it's cost effective!
Fast forward a few years...
I'm looking to replace the above hardware completely. In terms of
hardware, I've recently been able to acquire a new 12th generation Dell
PowerEdge R720 server with 64 GB of memory and dual E5-2660 processors
(2.20 Ghz). It has an integrated Dell H310 controller (FreeBSD mfi
driver) - which is presently only used for a mirrored root configuration
(2 x 500 GB NL SAS drives). I added 2 x LSI 9205-8e cards (LSISAS2308)
to the server. The LSI cards were flashed to the latest LSI firmware.
I also have 1 Dell MD1220 array with 24 x 900 GB 10K SAS drives for
data. The server has 4 x 1 GB Intel NICs.
I'm working with FreeBSD 9.1RC3 because I understand that the 9.1 series
includes many important improvements, and a totally new driver for the
LSI SAS HBA cards. I suspect that by the time the file server is ready
to go live, 9.1 will be officially released.
In terms of ZFS, in my testing, I have been using a single ZFS pool
comprised of 11 mirrored vdevs - a total of 22 disks, with 2 spares (24
disks total). As I understand it, I should be able to get the optimal
performance this way. I considered using multiple pools, but with
multiple pools comes multiple ZIL, L2ARC, etc and reduction in the
performance numbers. I've been told that people have far bigger ZFS
pools than my 22 disk zpool. As I understand it, as storage
requirements increase, I could easily add another MD1220 with an
additional 11 x mirrored vdev pairs and "append" this to the original
pool, giving me lots more space with little hassle.
At the moment, I have each LSI 9205-8e serving half of the disks in the
single MD1220 chassis in a split configuration - that is, 12 disks on
each LSI HBA card. It's a little overkill, I think, but the primary
reason for buying the second LSI HBA card was to ensure that I had a
spare card in the event that the first card ever failed. I figured that
I might as well use it to improve performance rather than sitting it on
the shelf collecting dust. Should I get funds to purchase an additional
MD1220 (another 24 disks), I was thinking of configuring 1 x 9205-8e per
MD1220, which I'm sure is also overkill. However, in theory, if both
sides of the mirrored vdevs were placed in separate MD1220s, I would
expect this to give me the ultimate in performance. In addition, should
I lose one 9205-8e or one MD1220, I would expect that I would be able to
"temporarily" continue in operation (while biting my nails without
redundancy!!!).
In addition, in my testing, I'm hoping to use NFSv4, which so far seems
good.
I have many, oh so many questions...
1) The new file server is relatively powerful. However, is one file
server enough to handle a load of approximately 2000 connections?
should I be looking at getting another server, or getting another
server and another MD1220? How is 64 GB of memory when I'm talking
about up to 2500-3000 ZFS filesystems on the box? I'm not using dedup
and using minimal compression.
2) It is my intention to have 1 ZFS filesystem per user (so approx. 1800
right now)... Is this the way to go? It sure makes quotas easier!
3) I understand that I should be adding a SSD based ZIL. I don't have
one right now. I've seen a lot of mixed information about what is the
most cost effective solution that actually makes a difference. I'm
wondering if someone could recommend a cost effective ZIL that works.
It has to be 2.5" because all the disk slots in my configuration are
2.5". I believe someone recently recommended one of the newer Intel
SSDs? As well, what size? (I understand that what complicates
performance in any one brand of SSD is that the difference sizes perform
differently)... Is there a problem if I put the ZIL in the file server
head that is being managed by the mfi driver, even though it is ZIL for
the disks managed by mps in the MD1220?
4) Under Linux, to be able to have a second server take over the disks
from the first server with my "manual failover", I had to hard-code
fsids on exports. Should I choose to do the same thing under FreeBSD,
I'm told that the fsids on FreeBSD are generated based on a unique
number for the file system type plus number generated by the file system
-- but will this number remain the same for the filesystem if its
exported from one system and imported into another ?
5) What would be the best recommended way of testing performance of the
setup? I've done some really really basic testing using filebench..
local filebench fs:
42115: 77.169: IO Summary: 3139018 ops, 52261.125 ops/s, (4751/9502
r/w), 1265.8mb/s, 0us cpu/op, 3.3ms latency
over NFS on a 100 mbps client:
27939: 182.854: IO Summary: 53254 ops, 887.492 ops/s, (81/162 r/w),
20.6mb/s, 876us cpu/op, 202.8ms latency
over NFS on a 1 gigabit client:
4588: 84.732: IO Summary: 442488 ops, 7374.279 ops/s, (670/1341 r/w),
175.3mb/s, 491us cpu/op, 23.5ms latency
... I don't have the resources to write my own test suite, custom to our
day to day operations, so I have to stick with one of the existing
solutions. What would the best way to do this? Would simply connecting
to the NFS server from several hundred clients, and running filebench be
an "optimal" solution?
Anyway, my apologies for the length of this e-mail. I've tried to
"shorten" this as much as I could. I have so many questions! :) I'm
hoping for any feedback that you might be able to provide, even if it's
just one comment or two. Thanks for taking the time to read!
Jason Keltz
More information about the freebsd-fs
mailing list