kern/151941: FreeBSD RELENG_6 server freezes during create of a snapshot on a disk with mpt

Andreas Longwitz longwitz at incore.de
Thu Nov 4 16:00:20 UTC 2010


>Number:         151941
>Category:       kern
>Synopsis:       FreeBSD RELENG_6 server freezes during create of a snapshot on a disk with mpt
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Nov 04 16:00:19 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Andreas Longwitz
>Release:        RELENG_6
>Organization:
Data Service Stockelsdorf
>Environment:
FreeBSD dssbkp1.incore 6.4-STABLE FreeBSD 6.4-STABLE #2: Wed Nov  3 18:18:31 CET 2010     root at dssbkp1.incore:/usr/src/sys/i386/compile/SERVER  i386
>Description:
An actual FreeBSD RELENG_6 system stops working on the command
      mount -u -o snapshot /prod/.snap/fscktest prod
where prod is a 1 TB partition on a scsi disk /dev/da0p1 connected
to mpt. The machine is semi-dead: All user processes are sleeping,
all cpus idle, only ping and ddb is possible. Giant is the only lock
shown by ps in ddb. The trace of the mount process causing the problem
looks like this:

Tracing command mount pid 7871 tid 100190 td 0xd1070480
sched_switch(d1070480,0,1) at sched_switch+0x14b
mi_switch(1,0,d1070480,f3408348,c03e6d10,...) at mi_switch+0x1ba
sleepq_switch(c0631cc4) at sleepq_switch+0x87
sleepq_wait(c0631cc4,0,d1070480,4,0,...) at sleepq_wait+0x5c
msleep(c0631cc4,c0631ce0,50,c05b02c2,0) at msleep+0x269
getnewbuf(0,0,4000,4000) at getnewbuf+0x6ce
getblk(d08de440,4cb7f440,0,4000,0,...) at getblk+0x360
breadn(d08de440,4cb7f440,0,4000,0,...) at breadn+0x31
bread(d08de440,4cb7f440,0,4000,0,f34084a8) at bread+0x20
ffs_alloccg(d134cad4,d5c,132dfce8,0,4000) at ffs_alloccg+0x13d
ffs_hashalloc(d134cad4,d5c,132dfce8,0,4000,...) at ffs_hashalloc+0x28
ffs_alloc(d134cad4,281ec6f,0,132dfce8,0,4000,d051d400,f34085e8,d134cad4,281ec6f,0,463,e8cae000)   
at ffs_alloc+0x20d
ffs_balloc_ufs2(d12dd880,7b1bc000,a0,4000,d051d400,0,f34087d8) at ffs_balloc_ufs2+0x16fc
ffs_snapshot(d099a2bc,d130e8a0,d130e8a0,d0982600,d08de440,...) at ffs_snapshot+0x89b
ffs_mount(d099a2bc,d1070480,10201000,0,d0520a80,...) at ffs_mount+0x991
vfs_domount(d1070480,d1125750,d0f8a250,11010000,d1125330) at vfs_domount+0x728
vfs_donmount(d1070480,11010000,f3408c04) at vfs_donmount+0x415
kernel_mount(d06599c0,11010000,804e040,0,fffffffe,...) at kernel_mount+0x38
ffs_cmount(d06599c0,bf7fdec0,11010000,d1070480,c05f84e0,...) at ffs_cmount+0x5d
mount(d1070480,f3408d04) at mount+0x18e
syscall(3b,3b,3b,804af21,bf7fe974,...) at syscall+0x2bf
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (21, FreeBSD ELF32, mount), eip = 0x880bfbcb, esp = 0xbf7fde9c, ebp = 0xbf7fdf38 ---

The problem arises both on i386 and amd64 server. Creating snapshots an 300 GB disks connected to amr controller work without any problems.
>How-To-Repeat:
see above
>Fix:
The reason for the problem is the update from 1.50.2.2 to 1.50.2.3 of the source ffs_balloc.c (SVN rev 196973 on 2009-09-08 14:19:14; MFC r180758).
If I revert this change from the kernel the problem disappears.

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list