kernel memory checks on boot vs. boot time
jhb at freebsd.org
Wed Mar 23 18:26:29 UTC 2011
On Wednesday, March 23, 2011 1:14:43 pm Alexander Best wrote:
> On Wed Mar 23 11, Oliver Fromme wrote:
> > Bjoern A. Zeeb <bz at freebsd.org> wrote:
> > > as part of the i386/pc98/amd64 boot process we are doing some basic
> > > memory testing, mapping pages and running a couple of pattern
> > > write/read tests on the first bytes (see getmemsize() implmentations).
> > > [...]
> > > With the growing number of memory this can lead to a significant
> > > fraction of kernel startup time on amd64 (~40s delays observed with
> > > 96G of RAM). Looping over the pages, but not mapping them and not
> > > running the pattern tests reduces this significantly (to single digit
> > > numbers of seconds).
> > > [...]
> > > Not wanting to remove them but maybe make more use of them in the
> > > future (as we do not report any problems we find currently) I'd suggest
> > > to introduce a tunable to disable/enable them, say
> > >
> > > hw.run_memtest
> > +1 for introducing a tunable.
> > I have also noticed the boot delay on server machines with
> > lots of memory (all of them are amd64, FWIW). Co-workers
> > have noticed it, too, causing some funny remarks. :-)
> or how about we dump the current memory checks, introduce a tunable and
> implement some *real* memory checks. as john pointed out the current checks
> are just rudimentary.
I think that doing *real* memory checks isn't really the role of our kernel.
Better effort would be spent on improving memtest86 since it is already trying
to solve this problem. Something that would be nice would be a way to invoke
memtest86 from the loader. Assuming you could pass arguments (such as a time
limit) to the memtest "kernel", then you could install memtest to
/boot/memtest and do something like 'nextboot -k memtest -o "-t 120"' to run
memtest for 2 hours on the next boot then reboot back into the stock OS after
it finishes, etc.
There are several tricky things you need to get right if you want to do *real*
memory tests that are a bit harder to do if you have a full blow kernel, such
as relocating yourself into already-checked pages at some point so you can
check all of the pages in the system, disabling caching for all pages except
your kernel so you test the actual RAM rather than your caches, etc.
More information about the freebsd-arch