Page fault
Robert Watson
rwatson at freebsd.org
Tue Nov 4 11:20:19 PST 2003
On Tue, 4 Nov 2003, Nils Andreas Hakansson wrote:
> I've disabled softupdates because of
> a panic("softdep_move_dependencies: need merge code");
Can't comment on this bit. Might want to send e-mail to Kirk directly.
> Could someone take a look at this?
>
> pst: timeout mfa=0x0032d5d0 cmd=0x02
> pst: timeout mfa=0x00336390 cmd=0x02
> pst: timeout mfa=0x0034cdd0 cmd=0x02
> <cut>
> pst: timeout mfa=0x003b7ab0 cmd=0x02
> pst: timeout mfa=0x00396db0 cmd=0x02
> pst: timeout mfa=0x003a3530 cmd=0x02
> pst: timeout mfa=0x00376890 cmd=0x02
This is your storage device getting unhappy, but I'm not really informed
enough on pst to say how or why. I don't know if it is because the
requests are bad, or because the controller/chain/device is unable to
service the request.
> ufs_access(): Error retrieving ACL on object (5).
> <cut>
> ufs_access(): Error retrieving ACL on object (5).
> ufs_access(): Error retrieving ACL on object (5).
> ufs_access(): Error retrieving ACL on object (5).
> ufs_access(): Error retrieving ACL on object (5).
> ufs_access(): Error retrieving ACL on object (5).
> ufs_access(): Error retrieving ACL on object (5).
> ufs_access(): Error retrieving ACL on object (5).
This is the UFS ACL code failing closed: it's unable to read the ACLs from
disk due to EIO (I/O failure). This is a correct response to that
scenario.
> Fatal trap 12: page fault while in kernel mode
> cpuid = 0; lapic.id = 00000000
> fault virtual address = 0xae18c0de
> fault code = supervisor read, page not present
> instruction pointer = 0x8:0xc066a566
> stack pointer = 0x10:0xea3a78cc
> frame pointer = 0x10:0xea3a7900
> code segment = base 0x0, limit 0xfffff, type 0x1b
> = DPL 0, pres 1, def32 1, gran 1
> processor eflags = interrupt enabled, resume, IOPL = 0
> current process = 76932 (smbd)
> kernel: type 12 trap, code=0
> Stopped at generic_bcopy+0x1a: repe movsl (%esi),%es:(%edi)
> db> trace
> generic_bcopy(cf6b0000,1a8,2,c06bd12c,0) at generic_bcopy+0x1a
> ffs_getextattr(ea3a7960,ea3a795c,c05159ad,d0346200,184) at
> ffs_getextattr+0xe0
This appears to be a bug in UFS2's handling of corrupted EA data on disk.
We have some changes in the TrustedBSD development trees to improve
resilience to on-disk corruption, but haven't merged them yet. Just to
confirm, could you use "gdb -k" on a copy of your kernel with debugging
symbols to see where *ffs_getextattr+0xe0 is? For me, it turns up in
ffs_vnops.c:1616, which is a variable assignment. There's a bcopy not far
above there, which seems the likely candidate.
> vn_extattr_get(cb1a8c8c,8,2,c06bd12c,ea3a79d0) at vn_extattr_get+0xaa
> ufs_getacl(ea3a7a14,ea3a7a40,c061560b,ea3a7a14,c06df280) at
> ufs_getacl+0x99
> ufs_vnoperate(ea3a7a14,c06df280,2,a6,c853cd10) at ufs_vnoperate+0x18
> ufs_access(ea3a7a6c,ea3a7b28,c057dcc9,ea3a7a6c,c0716cc8) at
> ufs_access+0xca
> ufs_vnoperate(ea3a7a6c,c0716cc8,c0716cc8,c853cd10,cb1a8c8c) at
> ufs_vnoperate+0x1
> 8
> vn_open_cred(ea3a7bdc,ea3a7cdc,1a4,d0bb7800,22) at vn_open_cred+0x359
> vn_open(ea3a7bdc,ea3a7cdc,1a4,22,c3ee0fb4) at vn_open+0x30
> kern_open(c853cd10,bfbff130,0,1,1a4) at kern_open+0x143
> open(c853cd10,ea3a7d14,c06c44d0,3ed,3) at open+0x30
> syscall(bfbf002f,82b002f,bfbf002f,bfbffd70,82b3724) at syscall+0x28f
> Xint0x80_syscall() at Xint0x80_syscall+0x1d
> --- syscall (5, FreeBSD ELF32, open), eip = 0x662b5233, esp = 0xbfbff07c,
> ebp =
> 0xbfbff098 ---
> db> show locks
> exclusive sleep mutex Giant r = 0 (0xc07115c0) locked @
> /usr/src/sys/vm/vm_fault
> .c:223
Holding Giant here is good. So to summarize:
This could be the result of a disk read failure.
The UFS code appears to be intolerant of said failure.
The ACL code failed closed properly, although perhaps not so usefully.
Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
robert at fledge.watson.org Network Associates Laboratories
More information about the freebsd-current
mailing list