huge email system

Chris Shenton chris at shenton.org
Sat Nov 22 06:49:30 PST 2003


David <david at madcoders.com> writes:

> We need to build a stable, redundant, and speedy email system that
> will last for a few years.  We need to handle about 500,000 emails
> per day.  We have about 30,000 users, so we need a lot of storage.
>
> Our current plan was to implement the following.
> 2 SMTP only servers.
> 3 NFS servers with RAID and SCSI
> 2 POP3 servers.
>
> But that leads us to questions such as -
>  - what would be the best way to authenticate?
>  - would the NFS servers need gig nic's? or dual bonded 100Mbit cards?
>  - what smtp server and what pop3 server to use (we want to use Maildir)
>  - what raid level?

I'm finishing something like that now. My design goals were No single
points of failure, 1GB server-stored email SMTP+STARTTLS and SMTPS,
IMAPS and IMAP + STARTTLS.  It's over-designed for our population but
the servers aren't the expensive part; I believe it could scale to
handle 100K users.  I'm replacing a sendmail-based system that's
exceptionally hard to fix because there are multiple single points of
failure and no one wants downtime.

I did the prototype on FreeBSD but the client preferred Solaris for
their production systems.  I'm using qmail with the excellent
qmail-ldap patch suite from www.nrg4u.com, plus courier-imap.
OpenLDAP is used for authentication and other user information
(quotas, account status, etc).

I'm using a pair of F5 load balancers in the front to detect up/down
services. This will also allow us to add servers if needs demand it; I
like being able to add small cheap boxes incrementally rather than
forklift upgrades of big iron.

Behind them are a few Netra V210 for SMTP[S], IMAP[S], POPS and soon
webmail (SqWebMail).  Each box has a read-only LDAP replica. Another
V210 runs the LDAP master, which replicates to the four mail servers.
Each V210 comes with quad gigabit ethernet: one interface to the load
balancer, two (redundancy) to backend switches on the NFS server, and
one for an administrative/monitoring network.
 
We bought a NetApp for the mail store; it is currently our one single
point of failure but NetApp has a great reputation for reliability; we
bought a used unit and saved about 70%.  (NetApp uses RAID4 internally
so disks can be added to a volume on the fly).  NetApp's "snapshot"
facility gives us restores from stupid user errors -- tape
backup/restore for this much data would be a nightmare.  (Qmail's
Maildir format is NFS safe but it sounds like you already know that :-)  


If my client didn't demand Solaris, I would have preferred FreeBSD.  I
would like to try using the Apple Xserve RAID box since it's 2.5TB
for $11K.  FC-attach it to a pair of FreeBSD boxes which serve it out
as NFS, use the FreeBSD-5.x "snapshot" feature for NetApp-style
backup/restore.  Service boxes like above, cheaply scalable by adding
more.

I like F5 balancers because you can heavily customize the application
layer health monitoring -- e.g., do a query on the LDAP master and
check for a sane response.  But they're not cheap.  Round-Robin DNS
isn't gonna avoid dead services and Windows clients aren't any good at
re-trying failed connections.  So I don't have a suggestion on an
inexpensive balancer; I'd be interested in hearing ideas.

As I mentioned above, our NetApp is the only single point of failure.
To get more space later on we can get a second unit then buy the
(pricey) clustering software to remove that SPoF.

Some other folks have talked about anti-virus/spam issues -- very good
discussion.  I am using qmail-ldap's recent integration of
qmail-smtp-viruscan which is a very fast block of MS executable
attachments; not foolproof but highly effective with little load.
We're considering going with some commercial spam/virus blocking
appliance but haven't decided yet; I'm trying to keep the qmail-ldap
system from getting any more complicated.  If, however, we integrate
something into our mail servers, we might have to add another box or
two to handle the increased load but it's not that expensive with
small boxes.

As I mentioned, I'm running all services on all boxes, rather than
separating SMTP from POP as you suggest; if this turns out to be a bad
idea, I can change the services around simply by re-defining the
service pools on the load balancer.





More information about the freebsd-isp mailing list