Partial deadlock in 4.9p1

Peter Jeremy peter.jeremy at alcatel.com.au
Thu Dec 18 00:35:01 PST 2003


I've just had a 4.9p1 system partially deadlock for about 30 minutes
and then spontaneously(?) recover.  The "ps" results point at vinum.
I've never seen this before and don't recall anything like this being
mentioned here before.

At the time, I had two find's running in a large NFS mounted
filesystem and was doing a "cvs -d /usr/ncvs -R checkout -P ports"
from a normal SCSI disk (/usr/ncvs) into a 12-way RAID-5 filesystem
($PWD == /mnt).  After a while (10-15 minutes), I noticed that the messages
from cvs had stopped, though a top and systat were still running and
the top output indicated that both find's were still running.

Apart from the 'cvs', the system seemed to be running normally but
trying "sync" in another terminal also wedged, though "df" worked
as did "ls" of a different subdirectory in /mnt.

ddb/ps showed that "cvs", "sync", "bufdaemon" and "syncer" were all
blocked on "vrlock" with the same wchan (c1482800).

The system sources are also under /mnt and when I went to src/sys and
started a search for "vrlock", the system recovered - the sync
returned and cvs started running again.  Having looked at the sources,
the whcan points to a struct rangelock - but gdb reports that that
particular rangelock is now being recycled continuously so I'm not
sure of the actual I/O operation that caused the problem.  Whilst
bufdaemon is often waiting in vrlock, once I stop doing disk I/O, the
dirtybuf count will go to zero and bufdaemon will go to psleep
instead.  No messages were reported on either the console or via
syslog.

Vinum details are:

aalp03# vinum list
12 drives:
D disk00                State: up       Device /dev/da1a        Avail: 3/4094 MB (0%)
D disk01                State: up       Device /dev/da2a        Avail: 3/4094 MB (0%)
D disk02                State: up       Device /dev/da3a        Avail: 3/4094 MB (0%)
D disk03                State: up       Device /dev/da4a        Avail: 3/4094 MB (0%)
D disk04                State: up       Device /dev/da5a        Avail: 3/4094 MB (0%)
D disk05                State: up       Device /dev/da6a        Avail: 3/4094 MB (0%)
D disk08                State: up       Device /dev/da7a        Avail: 3/4094 MB (0%)
D disk09                State: up       Device /dev/da8a        Avail: 3/4094 MB (0%)
D disk10                State: up       Device /dev/da9a        Avail: 3/4094 MB (0%)
D disk11                State: up       Device /dev/da10a       Avail: 3/4094 MB (0%)
D disk12                State: up       Device /dev/da11a       Avail: 3/4094 MB (0%)
D disk13                State: up       Device /dev/da12a       Avail: 3/4094 MB (0%)

1 volumes:
V raid5                 State: up       Plexes:       1 Size:         43 GB

1 plexes:
P raid5.p0           R5 State: up       Subdisks:    12 Size:         43 GB

12 subdisks:
S raid5.p0.s0           State: up       PO:        0  B Size:       4091 MB
S raid5.p0.s1           State: up       PO:     4032 kB Size:       4091 MB
S raid5.p0.s2           State: up       PO:     8064 kB Size:       4091 MB
S raid5.p0.s3           State: up       PO:       11 MB Size:       4091 MB
S raid5.p0.s4           State: up       PO:       15 MB Size:       4091 MB
S raid5.p0.s5           State: up       PO:       19 MB Size:       4091 MB
S raid5.p0.s6           State: up       PO:       23 MB Size:       4091 MB
S raid5.p0.s7           State: up       PO:       27 MB Size:       4091 MB
S raid5.p0.s8           State: up       PO:       31 MB Size:       4091 MB
S raid5.p0.s9           State: up       PO:       35 MB Size:       4091 MB
S raid5.p0.s10          State: up       PO:       39 MB Size:       4091 MB
S raid5.p0.s11          State: up       PO:       43 MB Size:       4091 MB
aalp03# tail /var/log/vinum_history 
23 Dec 2002 06:59:45.792709 *** Created devices ***
23 Dec 2002 06:59:48.833111 list 
11 Dec 2003 09:10:03.754890 *** vinum started ***
11 Dec 2003 09:10:03.779419 list 
17 Dec 2003 08:39:33.834732 *** vinum started ***
17 Dec 2003 08:39:33.862485 list 
17 Dec 2003 09:17:18.888825 *** vinum started ***
17 Dec 2003 09:17:18.938531 list 
17 Dec 2003 09:17:22.942145 *** vinum started ***
17 Dec 2003 09:17:22.945871 list 
aalp03# tail /var/log/messages
...
Dec 17 08:00:13 aalp03 /kernel: ipfw: Accounting cleared.
Dec 17 08:17:51 aalp03 /kernel: Connection attempt to TCP xx.yy.zz.253:113 from xx.yy.zz.9:1658 flags:0x02
Dec 17 08:20:13 aalp03 /kernel: Connection attempt to TCP xx.yy.zz.253:113 from xx.yy.zz.9:1757 flags:0x02
Dec 17 08:28:47 aalp03 /kernel: arplookup 192.168.161.24 failed: host is not on local network
Dec 17 08:29:51 aalp03 /kernel: arplookup 192.168.161.24 failed: host is not on local network
aalp03# cat /var/run/dmesg.boot
Copyright (c) 1992-2003 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 4.9-RELEASE-p1 #4: Thu Dec 11 11:27:41 EST 2003
    root at aalp03.alcatel.com.au:/mnt/obj/mnt/rpc/src/sys/wansim
Timecounter "i8254"  frequency 1193158 Hz
Timecounter "TSC"  frequency 598169769 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (598.17-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x673  Stepping = 3
  Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory  = 134201344 (131056K bytes)
avail memory = 127127552 (124148K bytes)
Preloaded elf kernel "kernel" at 0xc035f000.
VESA: v2.0, 1024k memory, flags:0x0, mode table:0xc02ff602 (1000022)
VESA: ATI MACH64
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82443BX host to PCI bridge (AGP disabled)> on motherboard
pci0: <PCI bus> on pcib0
sym0: <875> port 0x2000-0x20ff mem 0xc6aff000-0xc6afffff,0xc6afaf00-0xc6afafff irq 10 at device 6.0 on pci0
sym0: No NVRAM, ID 7, Fast-20, SE, parity checking
sym1: <875> port 0x2400-0x24ff mem 0xc6afe000-0xc6afefff,0xc6afae00-0xc6afaeff irq 11 at device 6.1 on pci0
sym1: No NVRAM, ID 7, Fast-20, SE, parity checking
tl0: <Compaq Netelligent 10/100 Proliant> port 0x3040-0x304f mem 0xc6afacf0-0xc6afacff irq 15 at device 7.0 on pci0
tl0: Ethernet address: 00:50:8b:dc:8a:75
miibus0: <MII bus> on tl0
lxtphy0: <LXT970 10/100 media interface> on miibus0
lxtphy0:  100baseFX, 100baseFX-FDX, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
tlphy0: <ThunderLAN 10baseT media interface> on miibus0
tlphy0:  10base2/BNC, 10base5/AUI
pci0: <ATI Mach64-GV graphics accelerator> at 8.0
pci0: <unknown card> (vendor=0x0e11, dev=0xa0f0) at 9.0
pci0: <unknown card> (vendor=0x114f, dev=0x0004) at 13.0 irq 5
isp0: <Qlogic ISP 1020/1040 PCI SCSI Adapter> port 0x2c00-0x2cff mem 0xc6afc000-0xc6afcfff irq 15 at device 15.0 on pci0
fxp0: <Intel 82559 Pro/100 Ethernet> port 0x3000-0x303f mem 0xc6b00000-0xc6bfffff,0xc6afb000-0xc6afbfff irq 15 at device 16.0 on pci0
fxp0: Ethernet address 00:d0:b7:20:bd:ab
inphy0: <i82555 10/100 media interface> on miibus1
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
isab0: <Intel 82371AB PCI to ISA bridge> at device 20.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel PIIX4 ATA33 controller> port 0xf100-0xf10f at device 20.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 20.2 irq 0
chip1: <Intel 82371AB Power management controller> at device 20.3 on pci0
orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xe8000-0xedfff,0xee000-0xeffff on isa0
pmtimer0 on isa0
sc0: <System console> on isa0
sc0: VGA <12 virtual consoles, flags=0x200>
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model Generic PS/2 mouse, device ID 0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
ppc0: cannot reserve I/O port range
sio0 at port 0x3f8-0x3ff irq 4 on isa0
sio0: type 16550A
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
DUMMYNET initialized (011031)
IP packet filtering initialized, divert enabled, rule-based forwarding enabled, default to accept, logging limited to 100 packets/entry by default
ata0-slave: ATAPI identify retries exceeded
acd0: CDROM <CD-224E> at ata0-master PIO4
Waiting 2 seconds for SCSI devices to settle
sa0 at sym1 bus 0 target 5 lun 0
sa0: <DEC TZ89     (C) DEC 2150> Removable Sequential Access SCSI-2 device 
sa0: 20.000MB/s transfers (10.000MHz, offset 15, 16bit)
pass14 at isp0 bus 0 target 14 lun 0
pass14: <SYMBIOS D1000 2> Fixed Processor SCSI-2 device 
pass14: 3.300MB/s transfers
pass15 at isp0 bus 0 target 15 lun 0
pass15: <SYMBIOS D1000 2> Fixed Processor SCSI-2 device 
pass15: 3.300MB/s transfers
da0 at sym0 bus 0 target 0 lun 0
da0: <COMPAQ BB00911CA0 3B07> Fixed Direct Access SCSI-2 device 
da0: 40.000MB/s transfers (20.000MHz, offset 15, 16bit), Tagged Queueing Enabled
da0: 8678MB (17773524 512 byte sectors: 255H 63S/T 1106C)
da1 at isp0 bus 0 target 0 lun 0
da1: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da1: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da2 at isp0 bus 0 target 1 lun 0
da2: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da2: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da3 at isp0 bus 0 target 2 lun 0
da3: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da3: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da3: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da5 at isp0 bus 0 target 4 lun 0
da5: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da5: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da5: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da6 at isp0 bus 0 target 5 lun 0
da6: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da6: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da6: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da10 at isp0 bus 0 target 11 lun 0
da10: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da10: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da10: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da11 at isp0 bus 0 target 12 lun 0
da11: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da11: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da11: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da9 at isp0 bus 0 target 10 lun 0
da9: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da9: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da9: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da8 at isp0 bus 0 target 9 lun 0
da8: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da8: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da8: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da7 at isp0 bus 0 target 8 lun 0
da7: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da7: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da7: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da4 at isp0 bus 0 target 3 lun 0
da4: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da4: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da4: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
da12 at isp0 bus 0 target 13 lun 0
da12: <SEAGATE ST34501WCSUN4.2G 0558> Fixed Direct Access SCSI-2 device 
da12: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da12: 4094MB (8385121 512 byte sectors: 255H 63S/T 521C)
Mounting root from ufs:/dev/da0s1a

-- 
Peter Jeremy


More information about the freebsd-stable mailing list