Opteron deadlock fix committed
Doug White
dwhite at gumbysoft.com
Sun May 1 13:47:48 PDT 2005
Hey folks,
Yesterday I committed a workaround for a deadlock condition in RELENG_5
and RELENG_5_4 caused by errata in AMD Opteron and Athlon64 processors.
(-CURRENT after April 8 is not affected due to changes in critical
sections.)
The deadlock appeared under heavy load, particularly with lots of I/O
interrupts, in SMP environments on both FreeBSD/amd64 and FreeBSD/i386.
Specifically, a spinwait as part of inter-processor TLB flushing triggered
Errata 106, which causes the cache on the spinning processor to not
update. To workaround the issue we enabled interrupts during the spinwait,
which breaks the lock on the cache when an interrupt occurs. AMD also
offers a workaround that the BIOS can implement, but not all systems have
applied the errata.
This workaround will appear in 5.4-RELEASE.
In addition, I committed a KDB feature written by Stephen Uphoff (ups) to
assist in debugging SMP situations where one processor is stuck with
interrupts disabled. Since the regular cpustop IPI won't get through in
that case, he implemented an NMI handler to perform the stop. This
feature, compiled in with the kernel option KDB_STOP_NMI and activated by
debug.kdb.stop_cpus_with_nmi, is available on all current branches and
will ship with 5.4-RELEASE.
If you have had problems with deadlock conditions on AMD processors in an
SMP environment, we urge you to update to the latest RELENG_5, or try the
upcoming 5.4-RC4.
I want to thank the following individuals for their help in debugging and
fixing the deadlock issue:
Paul Vixie and Peter Losher at ISC
Stephen Uphoff (ups)
Alan Cox (alc)
John Baldwin (jhb)
The rest of the FreeBSD RE team
--
Doug White | FreeBSD: The Power to Serve
dwhite at gumbysoft.com | www.FreeBSD.org
More information about the freebsd-stable
mailing list