NFS == lock && reboot
chris# at 1command.com
Thu Apr 5 10:44:48 UTC 2007
Quoting Oliver Fromme <olli at lurza.secnetix.de>:
> Chris H. <email@example.com> wrote:
> > Oliver Fromme wrote:
> > > [...]
> > > However, I don't think that your actual problem (lock-up
> > > and panics) is related to rpc.lockd or rpc.statd. It
> > > rather sounds like something else is wrong with your
> > > machine. NFS works perfectly fine for me, including
> > > copying huge files.
> > >
> > > You wrote that you had a lot of crashes that accumulated
> > > many files in lost+found. Well, maybe your filesystem
> > > was somehow damaged in the process. It is possible to
> > > damage file systems in a way that can lead to panics, and
> > > it's not necessarily detected and repaired by fsck.
> > Indeed. I /too/ considered this. However, I largely dismissed this
> > as a possibility as most all of them are 0 length in size. The others
> > are fragments of logs. I'm not /completely/ ruling this out though.
> The files in lost+found aren't the problem. The problem
> is the things that you cannot see, and fsck won't move
> those to lost+found.
> In particular, if you use softupdates on drives that have
> write-caching enabled, or on drives that illegally cache
> data even if it's disabled (be it intentionally or because
> of bugs in the firmware), it's almost guaranteed that the
> FS will take damage beyond repair on a crash, and even more
> so after several crashes.
> Another potential cause of problems is the background fsck
> feature in FreeBSD 6. I'm not sure if it has been fixed
> in 6-stable, maybe it has. I don't want to spread FUD.
> But in the past, if a machine crashed and rebooted during
> a background fsck, that was almost a guarantee for damage
> beyond repair, too. That's why I always disable background
> fsck on my machines. (Let me repeat: It _might_ be fixed
> in 6-stable, I don't know. I haven't seen a definitive
> confirmation of it being fixed on the mailing lists so
> far. If somebody knows otherwise, please correct me.)
Greetings, and thank you for your thoughtful reply.
Understood on all points. As mentioned; I wasn't /completely/
ruling that out. I have always refused to permit background fsck.
/Not/ because of any lack of faith I have in FBSD. Frankly, I
have nothing /but/ faith - perhaps more than I ought to. But
rather, because I insist on keeping tabs on what's going on
/at all times/. So, should the system crash/shutdown, or halt
for any reason; the BIOS will keep it in a "shutdown" state should
it gain control. In the case of a kernel reboot/crash; the loader
simply sits and awaits my confirmation before starting the system.
That way I am always guaranteed the opportunity to start in single
user mode and answer to any anomalies that the system reports with
So. In summary, I am /not/ completely ruling out your suggestion that
irreparable damage has been done as a result of the multitude of crashes
imposed upon it. I am also grateful for your taking the time to share
your experiences and insight with me. I simply haven't found anything
/definitive/ yet. Kris might argue here that NFS seems to be working
fine for everyone else, which would also add credence to your theory.
Both of you may indeed be correct. :)
I just think it'd be worth the time to follow through and make a dump
device and crash it to find the /definitive/ reason for this. It may
in fact turn out to be some obscure/near impossible anomaly in the NFS
code. That /I/ was just (un)lucky enough to stub my toe on. :)
At any rate, as this is a production server - and a /real/ busy one at
that; I want to get a (confirmed) good backup off of it before willingly
bashing it any further. It currently serves the largest Netscape browser
client archive on the net. They are all the 0.x - 4.x series browser
clients. You'd be amazed how popular/ how many people still use them.
So as backing it up onto the NFS mounted backup server is currently out
of the question, and there's more than a Terra byte of browser clients
alone, it's going to take me a little longer to follow through with the
dump device > crash > dump > back trace, than it would otherwise - but
it will be done. :)
Thank you again for taking the time to share your thoughts, suggestions
and experiences. I really appreciate it.
> Best regards
> Oliver Fromme, secnetix GmbH & Co. KG, Marktplatz 29, 85567 Grafing b. M.
> Handelsregister: Registergericht Muenchen, HRA 74606, Geschäftsfuehrung:
> secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht Mün-
> chen, HRB 125758, Geschäftsführer: Maik Bachmann, Olaf Erb, Ralf Gebhart
> FreeBSD-Dienstleistungen, -Produkte und mehr: http://www.secnetix.de/bsd
> "Python is an experiment in how much freedom programmers need.
> Too much freedom and nobody can read another's code; too little
> and expressiveness is endangered."
> -- Guido van Rossum
> freebsd-stable at freebsd.org mailing list
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
panic: kernel trap (ignored)
FreeBSD 5.4-RELEASE-p12 (SMP - 900x2) Tue Mar 7 19:37:23 PST 2006
More information about the freebsd-stable