i386/53382: Repetable panics in ffs_vget() on Proliant ML350 with SMP/HTT enabled

Przemyslaw Frasunek venglin at freebsd.lublin.pl
Mon Jun 16 11:00:36 PDT 2003


>Number:         53382
>Category:       i386
>Synopsis:       Repetable panics in ffs_vget() on Proliant ML350 with SMP/HTT enabled
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-i386
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Mon Jun 16 11:00:34 PDT 2003
>Closed-Date:
>Last-Modified:
>Originator:     Przemyslaw Frasunek
>Release:        FreeBSD 4.8-RELEASE i386
>Organization:
ATM S.A.
>Environment:
System: FreeBSD riot.atman.pl 4.8-RELEASE FreeBSD 4.8-RELEASE #0: Mon Jun 16 18:06:45 CEST 2003     root at riot.atman.pl:/usr/src/sys/compile/RIOT  i386

	Compaq Proliant ML350; problem repetable on other ML350s with
	SMP/HTT enabled.

Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 4.8-RELEASE #0: Mon Jun 16 18:06:45 CEST 2003
    root at riot.atman.pl:/usr/src/sys/compile/RIOT
Timecounter "i8254"  frequency 1193182 Hz
Timecounter "TSC"  frequency 2392260632 Hz
CPU: Intel(R) Xeon(TM) CPU 2.40GHz (2392.26-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf27  Stepping = 7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Hyperthreading: 2 logical CPUs
real memory  = 1073717248 (1048552K bytes)
avail memory = 1041403904 (1016996K bytes)
Preloaded elf kernel "kernel" at 0xc0308000.
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Host to PCI bridge> on motherboard
pci0: <PCI bus> on pcib0
ahc0: <Adaptec (Compaq OEM) 3960D Ultra160 SCSI adapter> port 0x2400-0x24ff mem 0xf7cf0000-0xf7cf0fff irq 10 at device 2.0 on pci0
aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
ahc1: <Adaptec (Compaq OEM) 3960D Ultra160 SCSI adapter> port 0x2800-0x28ff mem 0xf7ce0000-0xf7ce0fff irq 10 at device 2.1 on pci0
aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
pci0: <ATI Mach64-GR graphics accelerator> at 3.0
bge0: <Broadcom BCM5702X Gigabit Ethernet, ASIC rev. 0x1002> mem 0xf5fe0000-0xf5feffff irq 3 at device 4.0 on pci0
bge0: Ethernet address: 00:0b:cd:4e:17:f7
miibus0: <MII bus> on bge0
brgphy0: <BCM5703 10/100/1000baseTX PHY> on miibus0
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
pci0: <unknown card> (vendor=0x0e11, dev=0xa0f0) at 5.0 irq 5
isab0: <PCI to ISA bridge (vendor=1166 device=0201)> at device 15.0 on pci0
isa0: <ISA bus> on isab0
pci0: <Unknown PCI ATA controller> at 15.1
pcib1: <Host to PCI bridge> on motherboard
pci1: <PCI bus> on pcib1
pcib2: <Host to PCI bridge> on motherboard
pci2: <PCI bus> on pcib2
ciss0: <Compaq Smart Array 532> port 0x3000-0x30ff mem 0xf7df0000-0xf7df3fff,0xf7ec0000-0xf7efffff irq 15 at device 1.0 on pci2
ciss0: using 256 of 1024 available commands
ciss0:   0 logical drives configured
ciss0:   firmware 2.20
ciss0:   2 SCSI channels
ciss0:   signature 'CISS'
ciss0:   valence 1
ciss0:   supported I/O methods 0xe<simple,performant,MEMQ>
ciss0:   active I/O method 0x3<simple>
ciss0:   4G page base 0x00000000
ciss0:   interrupt coalesce delay 1000us
ciss0:   interrupt coalesce count 16
ciss0:   max outstanding commands 1024
ciss0:   bus types 0x2<ultra3>
ciss0:   server name ''
ciss0:   heartbeat 0x3000004a
ciss0: 0 logical drive
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0x3400-0x347f mem 0xf7eb0000-0xf7eb007f irq 11 at device 2.0 on pci2
xl0: reset didn't complete
xl0: Ethernet address: 00:04:75:f2:2b:e1
miibus1: <MII bus> on xl0
ukphy0: <Generic IEEE 802.3u media interface> on miibus1
ukphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib3: <Host to PCI bridge> on motherboard
pci3: <PCI bus> on pcib3
pcib4: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
pci4: <PCI bus> on pcib4
pcib5: <ServerWorks host to PCI bridge(unknown chipset)> on motherboard
pci5: <PCI bus> on pcib5
xl1: <3Com 3c905C-TX Fast Etherlink XL> port 0x4000-0x407f mem 0xf7ff0000-0xf7ff007f irq 10 at device 1.0 on pci5
xl0: reset didn't complete
xl1: Ethernet address: 00:04:75:f2:2b:dd
miibus2: <MII bus> on xl1
ukphy1: <Generic IEEE 802.3u media interface> on miibus2
ukphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl2: <3Com 3c980C Fast Etherlink XL> port 0x4080-0x40ff mem 0xf7fe0000-0xf7fe007f irq 15 at device 2.0 on pci5
xl2: Ethernet address: 00:04:75:db:fa:9c
miibus3: <MII bus> on xl2
xlphy0: <3c905C 10/100 internal PHY> on miibus3
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xcbfff,0xcc000-0xcc7ff,0xcc800-0xccfff,0xee000-0xeffff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse Explorer, device ID 4
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
DUMMYNET initialized (011031)
IP packet filtering initialized, divert disabled, rule-based forwarding enabled, default to accept, logging disabled
IP Filter: v3.4.31 initialized.  Default = pass all, Logging = enabled
Waiting 15 seconds for SCSI devices to settle
pt0 at ahc0 bus 0 target 15 lun 0
pt0: <COMPAQ PROLIANT 4L6I 1.78> Fixed Processor SCSI-2 device 
pt0: 3.300MB/s transfers
da2 at ahc0 bus 0 target 2 lun 0
da2: <COMPAQ BD03664545 B20B> Fixed Direct Access SCSI-2 device 
da2: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled
da2: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
da3 at ahc0 bus 0 target 3 lun 0
da3: <COMPAQ BD03664545 B20B> Fixed Direct Access SCSI-2 device 
da3: 160.000MB/s transfers (80.000MHz, offset 127, 16bit), Tagged Queueing Enabled
da3: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
da1 at ahc0 bus 0 target 1 lun 0
da1: <COMPAQ BD0366349C 3B06> Fixed Direct Access SCSI-2 device 
da1: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da1: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
da0 at ahc0 bus 0 target 0 lun 0
da0: <COMPAQ BD0186349B 3B11> Fixed Direct Access SCSI-2 device 
da0: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da0: 17365MB (35565080 512 byte sectors: 255H 63S/T 2213C)
da5 at ahc0 bus 0 target 5 lun 0
da5: <COMPAQ BD03685A24 HPB3> Fixed Direct Access SCSI-3 device 
da5: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da5: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
da4 at ahc0 bus 0 target 4 lun 0
da4: <COMPAQ BD03685A24 HPB3> Fixed Direct Access SCSI-3 device 
da4: 160.000MB/s transfers (80.000MHz, offset 63, 16bit), Tagged Queueing Enabled
da4: 34732MB (71132000 512 byte sectors: 255H 63S/T 4427C)
Mounting root from ufs:/dev/da0s1a


machine		i386
cpu		I686_CPU
ident		RIOT
maxusers	256

options 	INET
options 	INET6
options 	FFS
options 	FFS_ROOT
options 	SOFTUPDATES
options 	UFS_DIRHASH
options 	COMPAT_43
options 	SCSI_DELAY=15000
options 	USERCONFIG
options 	SYSVSHM
options 	SYSVMSG
options 	SYSVSEM
options         MAXDSIZ="(512*1024*1024)"
options         MAXSSIZ="(512*1024*1024)"
options         DFLDSIZ="(512*1024*1024)"
options		NMBCLUSTERS=131070
options		PMAP_SHPGPERPROC=400
options		SMP
options		APIC_IO
options		HTT
options 	P1003_1B
options 	_KPOSIX_PRIORITY_SCHEDULING
options		ICMP_BANDLIM
options 	KBD_INSTALL_CDEV

options         IPFILTER
options		IPFILTER_LOG

options		IPFIREWALL
options		IPFIREWALL_DEFAULT_TO_ACCEPT
options		DUMMYNET

device		isa
device		pci

device		fdc0	at isa? port IO_FD1 irq 6 drq 2
device		fd0	at fdc0 drive 0
device		fd1	at fdc0 drive 1

device		scbus
device		da
device		sa
device		cd
device		pass
device		pt
device		ses

device		ahc
device		ciss

device		atkbdc0	at isa? port IO_KBD
device		atkbd0	at atkbdc? irq 1 flags 0x1
device		psm0	at atkbdc? irq 12

device		vga0	at isa?

device		sc0	at isa? flags 0x100

device		npx0	at nexus? port IO_NPX irq 13

device		miibus		# MII bus support
device		xl
device		bge

pseudo-device	loop		# Network loopback
pseudo-device	ether		# Ethernet support
pseudo-device	pty		# Pseudo-ttys (telnet etc)
pseudo-device	bpf		#Berkeley packet filter
pseudo-device	tun
pseudo-device	gif

>Description:
	After short period of time with heavy disk activity, most I/O
	operations fails with EBADF. Then, page fault is caught after
	no more than one minute:

SMP 2 cpus
IdlePTD at phsyical address 0x0033a000
initial pcb at physical address 0x002a54e0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
mp_lock = 01000002; cpuid = 1; lapic.id = 07000000
fault virtual address   = 0x0
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc023ffb3
stack pointer           = 0x10:0xff6e8be0
frame pointer           = 0x10:0xff6e8c14
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4837 (squid)
interrupt mask          = bio  <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 01000002; cpuid = 1; lapic.id = 07000000
boot() called on cpu#1
syncing disks... 109 109 109 109 109 109 109 32 32 32 32 32 32 32 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19 19
giving up on 19 buffers
Uptime: 5m31s
xl0: reset didn't complete
xl1: reset didn't complete

dumping to dev #da/0x20001, offset 2097200
[...]
(kgdb) bt
#0  0xc0175a26 in dumpsys ()
#1  0xc01757f7 in boot ()
#2  0xc0175c50 in poweroff_wait ()
#3  0xc02415e0 in trap_fatal ()
#4  0xc0241271 in trap_pfault ()
#5  0xc0240e0f in trap ()
#6  0xc023ffb3 in generic_bzero ()
#7  0xc0201ae3 in ffs_vget ()
#8  0xc01f6795 in ffs_valloc ()
#9  0xc0208fa3 in ufs_makeinode ()
#10 0xc02069a8 in ufs_create ()
#11 0xc02092d9 in ufs_vnoperate ()
#12 0xc01aa4d4 in vn_open ()
#13 0xc01a66d0 in open ()
#14 0xc02418b1 in syscall2 ()
#15 0xc022eefb in Xint0x80_syscall ()
cannot read proc at 0
(kgdb) info all
eax            0x0      0
ecx            0x0      0
edx            0x0      0
ebx            0x0      0
esp            0xff6e8ab0       0xff6e8ab0
ebp            0xff6e8abc       0xff6e8abc
esi            0x0      0
edi            0x68000040       1744830528
eip            0xc0175a26       0xc0175a26
eflags         0x0      0
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             cannot read u area ptr for proc at 0

Sometimes, panic in pmap-related functions also occur:

Fatal trap 12: page fault while in kernel mode
mp_lock = 00000002; cpuid = 0; lapic.id = 06000000
fault virtual address   = 0xbfc00000
fault code              = supervisor write, page not present
instruction pointer     = 0x8:0xc023d461
stack pointer           = 0x10:0xff685e30
frame pointer           = 0x10:0xff685e3c
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, def32 1, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 8523 (cpp0)
interrupt mask          = none <- SMP: XXX
trap number             = 12
panic: page fault
mp_lock = 00000002; cpuid = 0; lapic.id = 06000000
boot() called on cpu#0

syncing disks... 96 87 87 72 71 69 67 66 66 56 53 52 50 48 48 32 31 31 22 21 19 18 17 17 14 12 11 11 3 3
done
Uptime: 5m33s
xl0: reset didn't complete
xl1: reset didn't complete

dumping to dev #da/0x20001, offset 2097200
[...]
(kgdb) bt
#0  0xc0175a26 in dumpsys ()
#1  0xc01757f7 in boot ()
#2  0xc0175c50 in poweroff_wait ()
#3  0xc02415e0 in trap_fatal ()
#4  0xc0241271 in trap_pfault ()
#5  0xc0240e0f in trap ()
#6  0xc023d461 in pmap_qenter ()
#7  0xc0185d56 in pipe_build_write_buffer ()
#8  0xc0185f28 in pipe_direct_write ()
#9  0xc01862ca in pipe_write ()
#10 0xc0184723 in dofilewrite ()
#11 0xc018461a in write ()
#12 0xc02418b1 in syscall2 ()
#13 0xc022eefb in Xint0x80_syscall ()
#14 0x804e900 in ?? ()
#15 0x804a696 in ?? ()
#16 0x804813e in ?? ()
(kgdb) info all
eax            0x0      0
ecx            0x0      0
edx            0x0      0
ebx            0x0      0
esp            0xff685d00       0xff685d00
ebp            0xff685d0c       0xff685d0c
esi            0x0      0
edi            0x0      0
eip            0xc0175a26       0xc0175a26
eflags         0x0      0
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x2f     47

>How-To-Repeat:
	Heavy I/O activity on Proliant ML350.
>Fix:
	Turn off SMP. 
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-i386 mailing list