kernel memory checks on boot vs. boot time
jhb at freebsd.org
Tue Mar 22 19:51:17 UTC 2011
On Tuesday, March 22, 2011 1:30:42 pm Bjoern A. Zeeb wrote:
> as part of the i386/pc98/amd64 boot process we are doing some basic
> memory testing, mapping pages and running a couple of pattern
> write/read tests on the first bytes (see getmemsize() implmentations).
> Depending on the features enabled and boot -v or not you may notice
> it as "nothing happens" booting from loader, after any of these
> possible lines:
> GDB: no debug ports present
> KDB: debugger backends: ddb
> KDB: current backend: ddb
> SMAP type=...
> but before the Copyright message.
> With the growing number of memory this can lead to a significant
> fraction of kernel startup time on amd64 (~40s delays observed with
> 96G of RAM). Looping over the pages, but not mapping them and not
> running the pattern tests reduces this significantly (to single digit
> numbers of seconds).
> As a first step I'd like to discuss how worth the actual memory tests
> are these days, to figure out a sensible default.
> Not wanting to remove them but maybe make more use of them in the
> future (as we do not report any problems we find currently) I'd suggest
> to introduce a tunable to disable/enable them, say
> with the following values:
> 0 do not map the page and do not run the pattern tests
> 1 do run the pattern test on the beginning of the page
> (current default).
> and maybe add
> 2 run the pattern tests on the entire pages?
> I would further suggest to add a printf independently of boot -v
> there, so that the user who would wait, will know what's (not) going on.
> Something along the lines of:
> "Testing physical address space (%s)."
> 0 "skipping extra pattern tests"
> 1 "pattern tests on beginning of each page"
> 2 "pattern tests on entire pages"
> If this is something that makes sense, I'd suggest to factor things
> out to sys/x86 and would provide a patch for further discussion and
> improvements (like error reporting, etc).
> Comments? Suggestions?
Do other platforms bother with these sorts of memory tests? If not I'd vote
to just drop it. I think this mattered more when you didn't have things like
SMAP (so you had to guess at where memory ended sometimes). Also, modern
server class x86 machines generally support ECC RAM which will trigger a
machine check if there is a problem. I doubt that the early checks are
catching anything even for the non-ECC case.
If nothing else, I would definitely drop this from amd64 (all those systems
have SMAP and machine check support, etc.).
More information about the freebsd-arch