UFS related panic (daily <-> find)
John Baldwin
jhb at freebsd.org
Mon Oct 21 22:48:17 UTC 2013
On Monday, October 21, 2013 9:30:36 am rank1seeker at gmail.com wrote:
> > > Same drill as before, see what instruction this is. Actually, this
> looks
> > > to
> > > be in the same location as your last panic, so a NULL pointer is 0x1
> > > instead
> > > of 0x0 again. In my experience, this would still indicate failing RAM
> to
> > > me,
> > > memtest86+ notwithstanding (memtest86+ is single threaded AFAIK, so it
> may
> > > not stress the hardware quite the same, e.g. if the error is heat
> related,
> > > etc.).
> >
> >
> > memtest* cannot conclusively diagnose a dimm as good. Usually the only
> > practical solution is to swap modules with known good ones.
> >
>
>
> 0xc082c552 <inodedep_find+13>: cmp %ecx,0x24(%eax)
> PREVIOUS we talked about
> 0xc083bd42 <inodedep_find+13>: cmp %ecx,0x24(%eax)
> CURRENT ONE
Different instruction pointer doesn't matter. The error is in the memory
that %eax is loaded from in a prior instruction.
> Now, after all this I recompiled kernel and world and there was no crash.
> How can it be, when it is far more stresing dan daily's 'find'?!
Because it might have shuffled where the bad memory cell now lives by having
the kernel text + data laid out differently in RAM?
> I see addresses 0xc08* and 0xc06* appearing each time, so as I have four
> DDR1 (400) modules, each of 256 MB = 1GB, can those addresses aid me in
> targeting failing module?
The virtual addresses (0xc*) do not matter. They are not physical addresses
which are what you would need.
> If I can't use memtest86+-4.20, to determine failing module, then what is a
> use of it at all?
> Test RAM speed perhaps?
Swap out your dimms. That's really the only test, esp. if you have a
reproducible crash.
--
John Baldwin
More information about the freebsd-hackers
mailing list