kernel memory checks on boot vs. boot time

Peter Wemm peter at wemm.org
Wed Mar 23 19:52:42 UTC 2011


On Wed, Mar 23, 2011 at 11:26 AM, John Baldwin <jhb at freebsd.org> wrote:
> On Wednesday, March 23, 2011 1:14:43 pm Alexander Best wrote:
>> On Wed Mar 23 11, Oliver Fromme wrote:
>> > Bjoern A. Zeeb <bz at freebsd.org> wrote:
>> >  > as part of the i386/pc98/amd64 boot process we are doing some basic
>> >  > memory testing, mapping pages and running a couple of pattern
>> >  > write/read tests on the first bytes (see getmemsize() implmentations).
>> >  > [...]
>> >  > With the growing number of memory this can lead to a significant
>> >  > fraction of kernel startup time on amd64 (~40s delays observed with
>> >  > 96G of RAM).  Looping over the pages, but not mapping them and not
>> >  > running the pattern tests reduces this significantly (to single digit
>> >  > numbers of seconds).
>> >  > [...]
>> >  > Not wanting to remove them but maybe make more use of them in the
>> >  > future (as we do not report any problems we find currently) I'd suggest
>> >  > to introduce a tunable to disable/enable them, say
>> >  >
>> >  >         hw.run_memtest
>> >
>> > +1 for introducing a tunable.
>> >
>> > I have also noticed the boot delay on server machines with
>> > lots of memory (all of them are amd64, FWIW).  Co-workers
>> > have noticed it, too, causing some funny remarks.  :-)
>>
>> or how about we dump the current memory checks, introduce a tunable and
>> implement some *real* memory checks. as john pointed out the current checks
>> are just rudimentary.
>
> I think that doing *real* memory checks isn't really the role of our kernel.
> Better effort would be spent on improving memtest86 since it is already trying
> to solve this problem.

Part of the reason for this "check" is a sanity check to make sure we
enumerated memory correctly and that we have at least got basic ram
functionality.  The existence of hw.physmem complicates this.  On
machines where hw.physmem could be used to tell the kernel that there
was more ram present than the kernel enumerates (old bioses etc), this
was kind of important to sanity check.

Even though modern hardware will fail windows compliance tests if the
SMAP etc is wrong, never underestimate the ability of bios makers to
find new and bizarre ways of screwing things up.

I'd kinda like to keep a basic "is this real, non mirrored ram?" test
there.  eg: the 2-pass step of writing physical address into each page
and then checking that they are still there on the second pass.

Oh, did I mention the machine where the ACPI bios info tells the OS
that the current state is S3 (suspended to ram) instead of S0?

When the kernel blows up at boot without a message.. we get the blame,
not the bios maker.

-- 
Peter Wemm - peter at wemm.org; peter at FreeBSD.org; peter at yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell


More information about the freebsd-arch mailing list