Upgrade from 8.2-STABLE to 9.0-RELEASE wedges on SuperMicro
H8DGiF-based system
Jeremy Chadwick
freebsd at jdc.parodius.com
Mon Jan 9 18:45:22 UTC 2012
On Mon, Jan 09, 2012 at 09:55:58AM -0800, Freddie Cash wrote:
> On Mon, Jan 9, 2012 at 9:50 AM, John Nielsen <lists at jnielsen.net> wrote:
> > From what you've said I strongly suspect that you have some kind of hardware issue. Dodgy RAM is my first guess, something cooling-related is my 2nd, and PSU is my 3rd. It is a little suspicious that you only started having problems after your upgrade but it could be coincidence or it could be something about the new software tickling the hardware differently than the old.
>
> That's what we're leaning toward as well. We're planning on doing a
> BIOS upgrade (betadrive is running v2.00 and alphadrive is v1.00),
> then a memtest86+ run, then check firmware on the SATA controllers.
For hardware/system troubleshooting advice:
1) BIOS upgrade -- since this is also what's responsible for ACPI bits
and other "configuration model" pieces of a system,
2) BIOS settings -- make sure they're all 100% identical between both
systems,
3) Controller firmware -- please make sure these are the same (your
controllers between boxes appear to be the same model),
4) Flaky PSU -- possibly voltages drop or raise below/above levels which
the mainboard can handle. As someone who buys Supermicro exclusively
for their systems, I can tell you that their PSUs ("Ablecom") are
quite cheap/horrible. It's worth purchasing a replacement -- if it
doesn't turn out to be the problem, you now have a spare PSU (which
is good to have -- our last systems failure was due to a blown PSU).
5) Flaky RAM -- memtest86+ can help here, mostly but not entirely.
6) Flaky mainboard -- it happens. Really. :-)
For OS advice:
Compare rc.conf, loader.conf, and so on. For example, is one system
using powerd(8) while the other isn't?
> If none of the above helps, we're thinking of swapping the CPUs
> between the two systems to see if the problems stay with the box or
> follow the CPU.
I was helping out someone on a public forum earlier this week who
purchased a Dell desktop system that started behaving oddly. memtest86+
claimed all his DIMMs were bad (regardless of slot), and replacement
DIMMs claimed the same thing. Dell kept insisting he reload the OS,
else they can try a motherboard swap, blah blah blah. What amused me
was that nobody looked at the CPU: Intel Core i3-550, which contains an
on-die MCH. Chances are the MCH is going bad, which means time to
replace the CPU.
CPUs rarely go bad, but now with on-die MCHs, on-die VGA, etc. it's
becoming much more plausible that the physical CPU needs to be replaced.
They've become practically computers inside of a computer. :-)
--
| Jeremy Chadwick jdc at parodius.com |
| Parodius Networking http://www.parodius.com/ |
| UNIX Systems Administrator Mountain View, CA, US |
| Making life hard for others since 1977. PGP 4BD6C0CB |
More information about the freebsd-stable
mailing list