UFS_DIRHASH panics on a dozen server within 30 hours
jhb at freebsd.org
Tue Sep 6 15:04:44 UTC 2011
On Monday, September 05, 2011 5:15:42 am Andreas Longwitz wrote:
> a week ago a dozen of my FreeBSD server crashed within a time span of
> 30 hours. On the server run very different applications, some of them
> were only standby. All server has the same kernel with FreeBSD 6 STABLE
> and there were no problems for yours until the "black monday".
> Yes I know that FreeBSD 6 is out of date now, but I don't like to
> change a very good running system. Another reason is that my hardware
> needs the amr driver and because of the outstanding solution of the
> amr_ioctl problem described in kern/155658 it is not possible for me
> to upgrade my production sytems without changing hardware.
Hmm, the patch in that PR should still apply to newer versions. Also, you
could just change the malloc() call to always allocate the maximum size
(instead of using a static buffer) for a smaller diff. It seems though that a
specific command is overrunning its buffer.
> Now I have a dozen core dumps and try to understand what happened.
> All dumps looks very similar and the panic is always "page fault"
> in _mtx_lock_sleep called from ufsdirhash_recycle or ufsdirhash_free
> because the used mtx_object is overwritten with zeros by someone
> before _mtx_lock_sleep is called.
I don't know of anything in particular that would explain this, esp. as to
why you would see them all occur at the same time. Maybe look to see if the
machines were doing something unusual at that time (a cron job, etc.)?
More information about the freebsd-stable