Softupdates: df, du, sync and fsck [quite long]

John Ekins john.ekins at brightview.com
Fri Jun 27 14:01:16 PDT 2003


Hello,

I've a couple of questions about soft updates. I've Googled heavily on this but
not really found a satisfactory answer. The story:

I'm running on numerous FreeBSD 4.7 SMP machines as primary MX machines. The mail
is not stored on the FreeBSD machines but on NetApps via NFS. However the mail is
temporarily spooled on the FreeBSD machines during normal MTA handling and passing
to an anti-virus scanner. I have one large partition /var on each machine where
basically all the work and temporary/transient files for the MTA and AV scanner
takes place.

These machines are heavily utilised, running quite "hot" with a load average of
anything from 2 to 8. Many thousands of temporary files are thus created and
deleted a minute. I have no problem with this as nearly all email is delivered in
under 1 minute whatever. 

I notice that after a while the amount of free space as shown by df considerably
varies from a du on /var. I'm aware of why this happens with soft updates, but
that's not the whole story. If I turn off incoming email on a machine, the space
does not seem to sync back to what it should be.  No matter how long I turn off
the MTA, the space is simply not returned, and df/du show differences of about
5:1. Nothing else is writing/holding open files on that partition (even turned
off syslog, cron, etc. and checked using lsof). In comparison, if, for example, on
my normal desktop machine I create a 500MB file, then delete it, the space shortly
afterwards is returned to me when I run df. The only way I've been able to recover
this space to what it should be is to reboot the machine. Which brings me to the
next problem...

As an example, here is a snippet from the console from when I rebooted an affected
machine:

  boot() called on cpu#2
  Waiting (max 60 seconds) for system process `vnlru' to stop...stopped
  Waiting (max 60 seconds) for system process `bufdaemon' to stop...stopped
  Waiting (max 60 seconds) for system process `syncer' to stop...timed out

  syncing disks... 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 
  giving up on 22 buffers
  Uptime: 27d23h1m27s
  Rebooting...

As you can see the file system is unable to sync. When the machine reboots it
literally takes hours to fsck the /var partition (only about 15GB). And the fsck
output is full of messages like this:

  UNEXPECTED SOFT UPDATE INCONSISTENCY

Now, is there a problem here with soft updates "losing track" of what is going on
on this busy partition? It would appear to be so as quietening the machine does
not lead to a proper sync. Secondly, why does the fsck take such an inordinate
amount of time for a smallish partition? 

I really like the performance benefits of soft updates, but it seems that I'm
going to have to turn it off on /var because of the problems that eventually
occur.

If anyone has some advice I'd be grateful.


Cheers,
John.


More information about the freebsd-questions mailing list