Damaged directory on ZFS
Pawel Jakub Dawidek
pjd at FreeBSD.org
Sun Oct 23 14:22:06 UTC 2011
On Mon, Oct 17, 2011 at 05:17:31PM -0700, Harold Paulson wrote:
> Hello,
>
> I've had a server that boots from ZFS panicking for a couple days. I have worked around the problem for now, but I hope someone can give me some insight into what's going on, and how I can solve it properly.
>
> The server is running 8.2-STABLE (zfs v28) with 8G of ram and 4 SATA disks in a raid10 type arrangement:
>
> # uname -a
> FreeBSD jane.sierraweb.com 8.2-STABLE-201105 FreeBSD 8.2-STABLE-201105 #0: Tue May 17 05:18:48 UTC 2011 root at mason.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC amd64
>
> And zpool status:
>
> NAME STATE READ WRITE CKSUM
> tank ONLINE 0 0 0
> mirror ONLINE 0 0 0
> gpt/disk0 ONLINE 0 0 0
> gpt/disk1 ONLINE 0 0 0
> mirror ONLINE 0 0 0
> gpt/disk2 ONLINE 0 0 0
> gpt/disk3 ONLINE 0 0 0
>
> It started panicking under load a couple days ago. We replaced RAM and motherboard, but problems persisted. I don't know if a hardware issue originally caused the problem or what. When it panics, I get the usual panic message, but I don't get a core file, and it never reboots itself.
>
> http://pastebin.com/F1J2AjSF
>
> While I was trying to figure out the source of the problem, I notice stuck various stuck processes that peg a CPU and can't be killed, such as:
>
> PID JID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
> 48735 0 root 1 46 0 11972K 924K CPU3 3 415:14 100.00% find
>
> They are not marked zombie, but I can't kill them, and restarting the jail they are in won't even get rid of them. truss just hangs with no output on them. On different occasions, I noticed pop3d processes for the same user getting stuck in this way. On a hunch I ran a "find" through the files in the user's Maildir and got a panic. I disabled this account and now the server is stable again. At least until locate.updatedb walks through that directory, I suppose. Evidentially, there is some kind of hole in the file system below that directory tree causing the panic.
>
> I can move that directory out of the way, and carry on, but is there anything I can do to really *repair* the problem?
Could you run these commands:
objdump -D /boot/kernel/zfs.ko.symbols | egrep '^[0-9a-f]{8,16} <fzap_cursor_retrieve>' | awk '{printf("0x%s\n", $1)}' | xargs -J ADDR printf "%u + %u\n" ADDR 0x111 | bc | xargs printf "0x%x\n" | xargs addr2line -e /boot/kernel/zfs.ko.symbols
They should convert fzap_cursor_retrieve+0x111 info file:line. Send it
here once you obtain it.
Thanks.
--
Pawel Jakub Dawidek http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-fs/attachments/20111023/f3fdfada/attachment.pgp
More information about the freebsd-fs
mailing list