FreeBSD 4.x - SATA problems ... ?

Marc G. Fournier scrappy at hub.org
Tue Jul 5 23:24:38 GMT 2005


Recently, I added a new server to our network, using the 3Ware RAID 
controller (the 9500S-4LP card) and 3x140G SATA drives ... overall, the 
system works, but I'm getting a very odd behaviour that I've never seen 
before ...

I have a process that run an rsync from another server to 'duplicate' the 
VPSs ... a 'live backup' sort of thing ... this is running on all our 
servers, without incident, *except*, it appears, the SATA server ...

I had disabled it for a time, and just re-enabled it this morning, and 
somehow or another, it seems to be causing file system corruption ...

As most 'old timers' here know, we use UNIONFS on all our servers ... when 
the corruption occurs, it looks like the "directory structures" are being 
changed ... this one is hard to explain :(  For example, 
/usr/local/cyrus/bin has a bunch of binaries in it ... the binaries are 
kept on the "lower layer", so the upper layer only has a 
/usr/local/cyrus/bin directory created/ghosted, but no copies of the 
binaries ... so, when you are in the VPS, and do an ls of that directory, 
you see:

# ls /usr/local/cyrus/bin
arbitron        cyr_expire      lmtpd           notifyd         smmapd
chk_cyrus       cyrdump         masssievec      pop3d           squatter
ctl_cyrusdb     deliver         master          pop3proxyd      timsieved
ctl_deliver     fud             mbexamine       quota           tls_prune
ctl_mboxlist    imapd           mbpath          reconstruct
cvt_cyrusdb     ipurge          mkimap          sievec

When the 'corruption' happens, those all disappear, almost as if someone 
did a 'rm -rf' of the directory within the VPS, and then a 'mkdir' ... 
except that, from what I've been able to tell, this only happens randomly, 
it happens on any of the VPSs *and* only around the time that the rsync 
process is running ...

As if, somehow, the rsync is taxing the system and causing bad writes ... 
but I can't find anything anywhere to indicate a problem ...

To "fix" things, I umount the UNIONFS layer, and then do a 'find / cpio' 
to copy the "top layer" back over to fix the directory structure itself 
...

The thing is, I don't even know *where* to begin debugging this issue, 
since there aren't any errors being reported anywhere ... but maybe 
someone out there has an idea?

thanks ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy at hub.org           Yahoo!: yscrappy              ICQ: 7615664


More information about the freebsd-stable mailing list