6.2: reproducible hang on amd64, traced to 24h of commits

Deomid Ryabkov myself at rojer.pp.ru
Mon Oct 15 17:08:23 PDT 2007


fwiw, i have not traced it down to a commit (got fed up with hangs), but 
conclusively singled out smartmontools as the trigger.
after adding 2 more disks, machine wouldn't even boot up past starting 
smartmontools, locking up hard with the same symptoms.
with smartmontools disabled, it booted up and has been up for > 2 months 
now.

Deomid Ryabkov wrote:
> ok, now that the machine has been up for 10 days, i am reasonably sure 
> i've close enough to this one.
>
> back in january i cvsupped to -STABLE and the box (dual head opteron 
> box) started hanging.
> and i mean it dies completely.
> i have all debug options and a working serial console, but still it 
> just dies and both serial and system console are unresponsive.
> no panic message on either, nothing. pretty sad.
>
> the kernel config is vanilla SMP GENERIC, with all debug options i 
> could think of enabled (after it started hanging).
>
> so the first thing i did after rebooting the box a couple of times is 
> fall back to kernel.old (6.1-STABLE circa august '06).
> no hangs. i then started incrementally updating, gradually getting 
> closer to jan 22.
> long story short, i seem to have isolated the problem to commits made 
> between
> date=2006.12.28.00.00.00 and date=2006.12.29.00.00.00.
> last hang i had was when running the 12/29 kernel, now it's 12/28 and 
> the box has been up for 2 weeks already.
> based on previois experience i'm pretty certain that this is it. with 
> bad kernel the box would never stay up more than a few days, never 
> more than 5.
> between 12/28 and 12/29 i see some changes to /sys/amd64/ and 
> /sys/pci/, which might've be the cause.
> i will probably start looking into individual changes, but if anyone 
> more experienced than me could take a look, it'd be appreciated.
> i am willing to try patches.
> i confirmed that recent (as of 3 weeks or so) -STABLE still has this 
> problem.
>
> thanks in advance.
>
> ====
> files under /sys that were changed between 12/28 and 12/29:
>
> Edit src/sys/amd64/amd64/mptable_pci.c
> Edit src/sys/amd64/pci/pci_bus.c
> Edit src/sys/contrib/dev/ath/public/wackelf.c
> Edit src/sys/dev/acpica/acpi_pci.c
> Edit src/sys/dev/acpica/acpi_pcib_acpi.c
> Edit src/sys/dev/acpica/acpi_pcib_pci.c
> Checkout src/sys/dev/ath/if_ath.c
> Edit src/sys/dev/cardbus/cardbus.c
> Edit src/sys/dev/drm/drm_agpsupport.c
> Edit src/sys/dev/pci/pci.c
> Edit src/sys/dev/pci/pci_if.m
> Edit src/sys/dev/pci/pci_pci.c
> Edit src/sys/dev/pci/pci_private.h
> Edit src/sys/dev/pci/pcib_private.h
> Edit src/sys/dev/pci/pcivar.h
> Edit src/sys/i386/i386/mptable_pci.c
> Edit src/sys/i386/pci/pci_bus.c
> Edit src/sys/kern/subr_bus.c
> Checkout src/sys/netgraph/ng_deflate.h
> Edit src/sys/pci/agp.c
> Edit src/sys/pci/agpreg.h
> Edit src/sys/powerpc/ofw/ofw_pcib_pci.c
> Edit src/sys/sparc64/pci/apb.c
> Edit src/sys/sparc64/pci/ofw_pcib.c
> Edit src/sys/sparc64/pci/ofw_pcibus.c
> Edit src/sys/sys/param.h
>
>
> ====
> kernel configuration used:
>
> include GENERIC
>
> options SMP
>
> options KDB
> options DDB
>
> makeoptions DEBUG=-g
> options INVARIANTS
> options INVARIANT_SUPPORT
> options WITNESS
> options DEBUG_LOCKS
> options DEBUG_VFS_LOCKS
> options DIAGNOSTIC
> ====
>


-- 
Deomid Ryabkov aka Rojer
myself at rojer.pp.ru
rojer at sysadmins.ru
ICQ: 8025844

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3295 bytes
Desc: S/MIME Cryptographic Signature
Url : http://lists.freebsd.org/pipermail/freebsd-hackers/attachments/20071016/07bc7af6/smime.bin


More information about the freebsd-hackers mailing list