Processes blocked on ufs or getblk

Andre Guibert de Bruet andy at siliconlandmark.com
Sun Jan 25 22:26:34 PST 2004


On Thu, 15 Jan 2004, Andre Guibert de Bruet wrote:

> On Thu, 15 Jan 2004, Lachlan O'Dea wrote:
>
> > -----BEGIN PGP SIGNED MESSAGE-----
> >
> > I found some discussion about this in December, but I don't think
> > anyone has been able to get to the bottom of it yet. The symptom is
> > that processes become permanently blocked in a state of ufs or getblk.
> > I can reproduce it with find at will:
> >
> > % ps axl | grep ufs
> >      0 13225 13215   1  -4  0  1300  804 ufs    D     ??    0:00.96 find
> > /var -xdev -type f ( -perm -u+x -or -perm -g+x -or -perm -o+
> >      0 28778 28765   0  -4  0  1300  804 ufs    D     ??    0:00.97 find
> > /var -xdev -type f ( -perm -u+x -or -perm -g+x -or -perm -o+
> >      0 33017 32933   2  -4  0  1304  788 ufs    D     p2-   0:10.69 find
> > / -name samba
> >
> > It has also happened several times in single user mode to makewhatis
> > running at the end of installworld.
> >
> > System details: 5.2-RC FreeBSD 5.2-RC #1: Fri Jan  9 04:45:51 EST 2004.
> > Dell PowerEdge 2500. All filesystems are on a single raid 5 volume
> > using the aac driver. The box has two CPUs, but I'm currently running
> > with kern.smp.disabled=1.
> >
> > % mount
> > /dev/aacd0s1a on / (ufs, local)
> > devfs on /dev (devfs, local)
> > /dev/aacd0s1e on /usr (ufs, local, with quotas, soft-updates)
> > /dev/aacd0s1d on /var (ufs, local, soft-updates)
> > procfs on /proc (procfs, local)
> > linprocfs on /usr/compat/linux/proc (linprocfs, local)
> >
> > I also have ACLs enabled on /usr, if that's at all relevant.
> >
> > The kernel has DDB and DEBUG_LOCKS. Please let me know if there's
> > anything I can do to help debug this.
> >
> > I don't know if this is related, but another problem is that when
> > shutting down, it always gives up on a bunch of buffers. I think I've
> > seen over 100, but usually it's 4-10 buffers.
>
> I'm seeing the same thing on my desktop machine. It usually occurs while
> scanning large directories and/or dealing with large collections of files
> rather quickly. I came across this bug while using gqview to go through my
> image collection and a second time while re-checking out my ports tree
> from local cvs. The programs appear to grab an exclusive lock and anything
> that tries to read or write to the directory (or get a directory listing)
> gets stuck in ufs state.
>
> My kernel config is rather simple, GENERIC without a lot of cruft except
> amr, ata, scsi, usb and pcm. I'll try to get the output of a ddb ps and a
> show lockedvnods.

I'm reviving this thread as I have more information that might help track
this problem down. The offending process in this case is gqview but it
could have been 'find /' or any other process running when there's high
system load (such as daylies).

>From the emails that I've gotten it appears that this bug affects users
that are using either ccd or hardware raid (amr driver in my case). I've
attached the output of a ddb ps and a 'show lockednods'.

Every time the getblk hang rears it's ugly head, I've seen
"amr0: bad slot x completed" (where x is an integer between 0 to 4)
printed on the serial console.

This makes me think that there's a failure mode or special state that
isn't being checked with the amr driver. Perusing the code shows that the
bad slot message is a result of a NULL busy command. I'm no storage driver
and my VFS knowledge is somewhat limited. Anyone out there want to have a
look at this? I'm willing to try out any patches on this system.

I'm currently running:
FreeBSD bling.home 5.2-CURRENT FreeBSD 5.2-CURRENT #1: Thu Jan 22 11:38:46 EST 2004     andy at bling.home:/usr/src/sys/i386/compile/BLING  i386

Full Kernel config file is up at:
http://bling.properkernel.com/BLING

I'll have a boot -v up shortly at:
http://bling.properkernel.com/boot-v.txt

Regards,

> Andre Guibert de Bruet | Enterprise Software Consultant >
> Silicon Landmark, LLC. | http://siliconlandmark.com/    >
-------------- next part --------------
db> ps
  pid   proc     uarea   uid  ppid  pgrp  flag   stat  wmesg    wchan  cmd
 1140 77082a50 b4d0d000    0     1  1140 0004002 [SLP]nanslp 0x60799bbc] reboot
 1043 6aa08dc0 b4c3c000  501     1  1042 0004000 [SLP]getblk 0x992ba724] gqview
   58 68ea9528 b08be000    0     0     0 0000204 [SLP]- 0x607c80ac] nfsiod 3
   57 68ea96e0 b08bf000    0     0     0 0000204 [SLP]- 0x607c80a8] nfsiod 2
   56 68ea9898 b08c0000    0     0     0 0000204 [SLP]- 0x607c80a4] nfsiod 1
   55 68ea9a50 b08c1000    0     0     0 0000204 [SLP]- 0x607c80a0] nfsiod 0
   54 68ea9c08 b08c2000    0     0     0 0000204 [SLP]vlruwt 0x68ea9c08] vnlru
   53 68ea9dc0 b08c3000    0     0     0 0000204 [SLP]syncer 0x60799580] syncer
   52 690f0000 b2902000    0     0     0 0000204 [SLP]psleep 0x607c142c] bufdaemon
   51 690f01b8 b2903000    0     0     0 000020c [SLP]pgzero 0x607ce828] pagezero
   50 690f0370 b2904000    0     0     0 0000204 [SLP]psleep 0x607ce880] vmdaemon
   49 690f0528 b2905000    0     0     0 0000204 [SLP]psleep 0x607ce86c] pagedaemon
    9 690f06e0 b2906000    0     0     0 0000204 [SLP]- 0xb2930d0c] schedcpu
   48 690f0898 b294f000    0     0     0 0000204 [IWAIT] swi0: tty:sio
   47 68e55a50 b088b000    0     0     0 0000204 [SLP]usbtsk 0x60791c04] usbtask
   46 68e55c08 b088c000    0     0     0 0000204 [SLP]usbevt 0x68fcd210] usb0
    8 68e55dc0 b088d000    0     0     0 0000204 [SLP]actask 0x608cb36c] acpi_task2
    7 68ea7000 b088e000    0     0     0 0000204 [SLP]actask 0x608cb36c] acpi_task1
    6 68ea71b8 b088f000    0     0     0 0000204 [SLP]actask 0x608cb36c] acpi_task0
--More--
   45 68ea7370 b0890000    0     0     0 0000204 [IWAIT] swi7: task queue
   44 68ea7528 b0891000    0     0     0 0000204 [IWAIT] swi7: acpitaskq
   43 68ea76e0 b0892000    0     0     0 0000204 [IWAIT] swi3: cambio
   42 68ea7898 b0893000    0     0     0 0000204 new [IWAIT] swi2: camnet
   41 68ea7a50 b0894000    0     0     0 0000204 new [IWAIT] swi5:+
    5 68ea7c08 b08b9000    0     0     0 0000204 [SLP]tqthr 0x6079afe8] taskqueue
   40 68ea7dc0 b08ba000    0     0     0 0000204 [IWAIT] swi6:+
   39 68ea9000 b08bb000    0     0     0 0000204 [SLP]- 0x6078e9a0] random
    4 68e4c528 b085b000    0     0     0 0000204 [SLP]- 0x60794220] g_down
    3 68e4c6e0 b085c000    0     0     0 0000204 [SLP]- 0x6079421c] g_up
    2 68e4c898 b085d000    0     0     0 0000204 [SLP]- 0x60794214] g_event
   38 68e4ca50 b085e000    0     0     0 0000204 new [IWAIT] swi4: vm
   37 68e4cc08 b085f000    0     0     0 000020c [LOCK  Giant 69109cc0] swi8: tty:sio clock
   36 68e4cdc0 b0860000    0     0     0 0000204 [IWAIT] swi1: net
   35 68e55000 b0861000    0     0     0 0000204 new [IWAIT] irq0: clk
   34 68e551b8 b0886000    0     0     0 0000204 new [IWAIT] irq23:
   33 68e55370 b0887000    0     0     0 0000204 new [IWAIT] irq22:
   32 68e55528 b0888000    0     0     0 0000204 [IWAIT] irq21: amr0
   31 68e556e0 b0889000    0     0     0 0000204 new [IWAIT] irq20:
   30 68e55898 b088a000    0     0     0 0000204 [IWAIT] irq19: fwohci1+
--More--
   29 64f661b8 aee2a000    0     0     0 0000204 [IWAIT] irq18: rl0
   28 64f66370 aee2b000    0     0     0 0000204 [IWAIT] irq17: atapci1 pcm0
   27 64f66528 aee2c000    0     0     0 0000204 [IWAIT] irq16: fwohci0
   26 64f666e0 aee2d000    0     0     0 0000204 [IWAIT] irq15: ata1
   25 64f66898 aee52000    0     0     0 0000204 [IWAIT] irq14: ata0
   24 64f66a50 aee53000    0     0     0 0000204 new [IWAIT] irq13:
   23 64f66c08 aee54000    0     0     0 0000204 new [IWAIT] irq12:
   22 64f66dc0 aee55000    0     0     0 0000204 new [IWAIT] irq11:
   21 68e4c000 b0858000    0     0     0 0000204 new [IWAIT] irq10:
   20 68e4c1b8 b0859000    0     0     0 0000204 new [IWAIT] irq9: acpi0
   19 68e4c370 b085a000    0     0     0 0000204 new [IWAIT] irq8: rtc
   18 64f5d000 aedd8000    0     0     0 0000204 new [IWAIT] irq7: ppc0
   17 64f5d1b8 aee21000    0     0     0 0000204 new [IWAIT] irq6:
   16 64f5d370 aee22000    0     0     0 0000204 new [IWAIT] irq5:
   15 64f5d528 aee23000    0     0     0 0000204 new [IWAIT] irq4: sio0
   14 64f5d6e0 aee24000    0     0     0 0000204 new [IWAIT] irq3: sio1
   13 64f5d898 aee25000    0     0     0 0000204 [CPU 0] irq1: atkbd0
   12 64f5da50 aee26000    0     0     0 000020c [Can run] idle: cpu0
   11 64f5dc08 aee27000    0     0     0 000020c [CPU 1] idle: cpu1
    1 64f5ddc0 aee28000    0     0     1 0004200 [SLP]wait 0x64f5ddc0] init
--More--
   10 64f66000 aee29000    0     0     0 0000204 [CV]ktrace 0x607977a4] ktrace
    0 60794320 60c1f000    0     0     0 0000200 [SLP]sched 0x60794320] swapper
db>  show lockedvnods
Locked vnodes
0x6d4a1e38: tag ufs, type VREG, usecount 2, writecount 0, refcount 21, flags (VV_OBJBUF), lock type ufs: EXCL (count 1) by thread 0x6aa09e70 (pid 1043)
	ino 22988755, on dev amrd0a (4, 30)
db> 


More information about the freebsd-current mailing list