Debugging bad memory problems

Mehmet Erol Sanliturk m.e.sanliturk at gmail.com
Mon Apr 27 00:02:33 UTC 2015


On Sun, Apr 26, 2015 at 3:15 PM, Valeri Galtsev <galtsev at kicp.uchicago.edu>
wrote:

>
> On Sun, April 26, 2015 4:05 pm, Fernando Apesteguía wrote:
> > On Sun, Apr 26, 2015 at 10:05 PM, Valeri Galtsev
> > <galtsev at kicp.uchicago.edu> wrote:
> >>
> >> On Sun, April 26, 2015 12:11 pm, Fernando Apesteguía wrote:
> >>> Hi,
> >>>
> >>> I suspect my old and beloved AMD64 laptop is suffering from bad memory
> >>> problems: I get random crashes of well tested programs like sh, which,
> >>> etc even when I executed some of them from /rescue.
> >>
> >> If RAM is a suspect the first thing I would do is re-seat memory
> >> modules.
> >> Open the box. (Observe static precautions!) Remove memory modules.
> >> Install
> >> them again.
> >>
> >> Do memtest86 (by booting into memtest86, you can have that in your boot
> >> options, or you can boot off external media as others suggested).
> >>
> >> If you still have problems: try to run with one memory module instead of
> >> two. At some point when they went to higher RAM speeds memory bus
> >> amplifier became more fragile (some chips, some manufacturers, as not it
> >> is part of CPU, this may be true only about some of the CPU models). You
> >> sometimes can slightly fry it if you merely leave laptop running on
> >> battery, letting battery run down and laptop powering off due to that.
> >> With some of chips this may lead to slightly frying it - memory
> >> controller
> >> portion of it, address bus amplifier in particular. Bus amplifier
> >> becomes
> >> slightly lower frequency, which results in poorer handling capacitive
> >> load
> >> (which is larger if you have more RAM), and it is marginally OK,
> >> occasionally having address errors. Going to one module may resolve
> >> this.
> >> You will know if this is likely the case if memtest86 is successful with
> >> each of single RAM modules, but fails (in random places, often not
> >> reproducible) with both.
> >>
> >> Good luck!
> >
> > I booted from a memtest CD-ROM. It passed a couple of tests fine and
> > then it rebooted while doing a "bit fade" test at around 93%. Removing
> > the modules is tricky since this laptop has screws all around in dark
> > corners (even removing the battery needs a screw driver). I will try
> > to limit physical memory with hw.physmem and see if it makes any
> > difference.
>
> The last will not help against what I mentioned, as capacitive load on
> memory address bus is defined by what is physically attached to it.
>
> One usually runs memtest86 for 24 hours at lest. One loop will catch
> "solid defects" like adjacent line on the board connected (while they
> shouldn't). Memory related failures to the contrary are often
> intermittent. In worst case I've seen, they only manifested under intense
> load of the box (whereas memtest86 is equivalent to almost zero load).
>
> Good luck!
>
> Valeri
>
> ++++++++++++++++++++++++++++++++++++++++
> Valeri Galtsev
> Sr System Administrator
> Department of Astronomy and Astrophysics
> Kavli Institute for Cosmological Physics
> University of Chicago
> Phone: 773-702-4247
> ++++++++++++++++++++++++++++++++++++++++
>



Failure may be in memory management circuits instead of memory chips .
To test this situation , the existing memories may be replaced by memory
chips that they known to work  ( if it can be done ) .


Thank you very much .


Mehmet Ero Sanliturk


More information about the freebsd-questions mailing list