[Bug 270166] Upgrade to/new install of RELEASE 13.1 causes boot freeze (subr_clockcalib.c)

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 13 Mar 2023 00:20:58 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270166

            Bug ID: 270166
           Summary: Upgrade to/new install of RELEASE 13.1 causes boot
                    freeze (subr_clockcalib.c)
           Product: Base System
           Version: 13.1-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: kronenpj@gmail.com

Created attachment 240805
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=240805&action=edit
Hack to correct described problem w/instrumentation

Addition of and changes related to sys/kern/subr_clockcalib.c causes infinite
loop / 1+ CPU 100% usage during boot. The termination condition of the loop is
indeterminate.

System details: Virtualized FreeBSD 13.x KVM guest on Fedora Linux 37, AMD
Ryzen 9 3900XT 12-Core Processor.

After bisection to commit baee6cc1814b8e851555d2caa6410eedcef2c6c8, added
instrumentation printfs. This isolated the problem to the exit condition for
clockcalib. The cpu() call never returned a value larger than 4294967295 when
calibrating the LAPIC timer.

The proposed solution (hack) captures the last clk() value and aborts the
cpu_spinwait() loop if the current clk() and last value are the same. Because
freq in this situation always ends up at 0, which causes a divide-by-zero error
in the calling routine. A check for this condition with a return of 1 instead
is included.

I acknowledge there are cleaner ways to correct this problem, and I explored a
kernel command-line option to fall-back to the previous behavior if desired. I
have not figured how to accomplish this.

-- 
You are receiving this mail because:
You are the assignee for the bug.