UFS_DIRHASH panics on a dozen server within 30 hours

Sun Sep 11 22:24:02 UTC 2011

Hi,

thank you very much for your answer, I think you pointed me in the right
direction.

> Hmm, the patch in that PR should still apply to newer versions.  Also, you 
> could just change the malloc() call to always allocate the maximum size 
> (instead of using a static buffer) for a smaller diff.  It seems though that a 
> specific command is overrunning its buffer.

Yes. I found that megarc often wants a buffer of 12868 bytes, but the
controller sends always 25412 bytes back. Because this seems to be an
error in megarc I have submitted a patch for the existing PR ports/137938.

Furthermore I saw some sporadic answers of the controller to megarc
ioctl's with much more data than the buffer size stated by megarc.
Therefore I still use the maximum size in my updated patch in kern/155658.

>> Now I have a dozen core dumps and try to understand what happened.
>> All dumps looks very similar and the panic is always "page fault"
>> in _mtx_lock_sleep called from ufsdirhash_recycle or ufsdirhash_free
>> because the used mtx_object is overwritten with zeros by someone
>> before _mtx_lock_sleep is called.
> 
> I don't know of anything in particular that would explain this, esp. as to
> why you would see them all occur at the same time.

In the meantime I had three more crashes in FreeBSD 6. I assume it is
the same problem as in FreeBSD 8, because the memory corruption problem
  caused by megarc and the controller has nothing to do with the version
of FreeBSD. I have verified that the overruns occurs in FreeBSD 6 too,
but I do not have an explanation, why FreeBSD did not crash for years
because I used megarc all the time every day.

-- 
Dr. Andreas Longwitz

Data Service GmbH
Beethovenstr. 2A
23617 Stockelsdorf
Amtsgericht Lübeck, HRB 318 BS
Geschäftsführer: Wilfried Paepcke, Dr. Andreas Longwitz, Josef Flatau