ZFS hangs very often
Attila Nagy
bra at fsn.hu
Mon Mar 31 05:06:59 PDT 2008
Hello,
On my desktop machine I use a ZFS pool for everything but the swap and
the root fs (so /usr, /tmp, and everything else is on ZFS, swap and / is
on gmirror of two-two partitions).
The first setup was a FreeBSD/i386 7-STABLE, and the pool consisted of
two partitions from two SATA disks, which were encrypted with GELI
individually.
After using the system for some weeks (without any higher number of IO
activity, just working on the machine), the first hang came: I couldn't
move the mouse under X, but remote sessions were alive and also the
clock app still counted the time fine. I couldn't log into the machine
via ssh, the port was open, but I haven't got the banner.
I've done a portupgrade at that time.
After this (and several other) hangs, I decided to remove GELI from the
equation without success. Then one partition (disk) instead of two. And
now, I am running amd64 instead of i386 and the problem still persists.
I've attached my notebook to the machine and here is what I have during
the hang (currently, I am in the process of upgrading some ports and now
a configure tries to run, but the machine has stopped):
KDB: enter: manual escape to debugger
[thread pid 23 tid 100022 ]
Stopped at kdb_enter+0x31: leave
db> bt
Tracing pid 23 tid 100022 td 0xffffff000127c350
kdb_enter() at kdb_enter+0x31
scgetc() at scgetc+0x461
sckbdevent() at sckbdevent+0xa4
kbdmux_intr() at kbdmux_intr+0x43
kbdmux_kbd_intr() at kbdmux_kbd_intr+0x20
taskqueue_run() at taskqueue_run+0x9f
ithread_loop() at ithread_loop+0x180
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb4f04d30, rbp = 0 ---
db> ps
pid ppid pgrp uid state wmesg wchan cmd
77873 77871 76757 0 S+ piperd 0xffffff0010036ba0 as
77872 77871 76757 0 S+ zfs:(&zi 0xffffff0020ba0298 cc1plus
77871 77870 76757 0 S+ wait 0xffffff0001446000 c++
77870 77321 76757 0 S+ wait 0xffffff000eba7468 sh
77321 76882 76757 0 S+ wait 0xffffff00396448d0 sh
76882 76881 76757 0 S+ wait 0xffffff000eba5468 sh
76881 76757 76757 0 S+ wait 0xffffff00014b28d0 sh
76757 76755 76757 0 Ss+ wait 0xffffff001b369000 make
76755 62725 62725 0 S+ select 0xffffffff80a89d50 script
62725 817 62725 0 S+ wait 0xffffff0001f43000 initial thread
86757 86674 86757 1001 SL+ pfault 0xffffffff80a9a79c ssh
86674 86672 86674 1001 Ss+ wait 0xffffff0001f428d0 bash
86672 86670 86670 1001 S select 0xffffffff80a89d50 sshd
86670 721 86670 0 Ss sbwait 0xffffff0001d2015c sshd
62788 802 62788 0 S+ ttyin 0xffffff00013bac10 csh
46310 801 46310 0 ?+ csh
817 800 817 0 S+ pause 0xffffff0001f420c0 csh
807 1 807 0 Ss+ ttyin 0xffffff000139e810 getty
806 1 806 0 Ss+ ttyin 0xffffff00013bb410 getty
805 1 805 0 Ss+ ttyin 0xffffff00013ba810 getty
804 1 804 0 Ss+ ttyin 0xffffff00013ba010 getty
803 1 803 0 Ss+ ttyin 0xffffff00013b9410 getty
802 1 802 0 Ss+ wait 0xffffff00014b08d0 login
801 1 801 0 Ss+ wait 0xffffff0001567468 login
800 1 800 0 Ss+ wait 0xffffff00014b0468 login
737 1 737 0 ?s cron
731 1 731 25 Ss pause 0xffffff00015650c0 sendmail
727 1 727 0 ?s sendmail
721 1 721 0 Ss select 0xffffffff80a89d50 sshd
688 687 687 123 ? ntpd
687 1 687 0 Ss select 0xffffffff80a89d50 ntpd
668 1 668 0 Ss select 0xffffffff80a89d50 powerd
559 1 559 0 ?s syslogd
488 1 488 0 Ss select 0xffffffff80a89d50 devd
440 1 440 0 Ss select 0xffffffff80a89d50 moused
248 1 248 0 Ss pause 0xffffff00015640c0 adjkerntz
175 0 0 0 SL zfs:(&tq 0xffffff0001583080 [zil_clean]
174 0 0 0 SL zfs:(&tq 0xffffff00015831c0 [zil_clean]
173 0 0 0 SL zfs:(&tq 0xffffff0001583300 [zil_clean]
172 0 0 0 SL zfs:(&tq 0xffffff0001583440 [zil_clean]
171 0 0 0 SL zfs:(&tq 0xffffff0001583580 [zil_clean]
170 0 0 0 SL zfs:(&tq 0xffffff00015836c0 [zil_clean]
168 0 0 0 SL zfs:(&tx 0xffffff000146c590
[txg_thread_enter]
167 0 0 0 SL zfs:(&zi 0xffffff000c717d58
[txg_thread_enter]
166 0 0 0 SL zfs:(&tx 0xffffff00014e0a40
[txg_thread_enter]
165 0 0 0 SL vgeom:io 0xffffff000145c410
[vdev:worker ad0s1d]
164 0 0 0 SL zfs:(&tq 0xffffff000158e300
[spa_zio_intr_5]
163 0 0 0 SL zfs:(&tq 0xffffff000158e300
[spa_zio_intr_5]
162 0 0 0 SL zfs:(&tq 0xffffff000158e1c0
[spa_zio_issue_5]
161 0 0 0 SL zfs:(&tq 0xffffff000158e1c0
[spa_zio_issue_5]
160 0 0 0 SL zfs:(&tq 0xffffff0001227d00
[spa_zio_intr_4]
159 0 0 0 SL zfs:(&tq 0xffffff0001227d00
[spa_zio_intr_4]
158 0 0 0 SL zfs:(&tq 0xffffff0001227bc0
[spa_zio_issue_4]
157 0 0 0 SL zfs:(&tq 0xffffff0001227bc0
[spa_zio_issue_4]
156 0 0 0 SL zfs:(&tq 0xffffff0001227a80
[spa_zio_intr_3]
155 0 0 0 SL zfs:(&tq 0xffffff0001227a80
[spa_zio_intr_3]
154 0 0 0 SL zfs:(&tq 0xffffff0001227940
[spa_zio_issue_3]
153 0 0 0 SL zfs:(&tq 0xffffff0001227940
[spa_zio_issue_3]
152 0 0 0 SL zfs:&vq- 0xffffff00015d0c88
[spa_zio_intr_2]
151 0 0 0 SL vmwait 0xffffffff80a9a79c
[spa_zio_intr_2]
150 0 0 0 SL vmwait 0xffffffff80a9a79c
[spa_zio_issue_2]
149 0 0 0 SL vmwait 0xffffffff80a9a79c
[spa_zio_issue_2]
148 0 0 0 SL zfs:(&tq 0xffffff0001227580
[spa_zio_intr_1]
147 0 0 0 SL zfs:(&tq 0xffffff0001227580
[spa_zio_intr_1]
146 0 0 0 SL zfs:(&tq 0xffffff0001227440
[spa_zio_issue_1]
145 0 0 0 SL zfs:(&tq 0xffffff0001227440
[spa_zio_issue_1]
144 0 0 0 SL zfs:(&tq 0xffffff00012271c0
[spa_zio_intr_0]
143 0 0 0 SL zfs:(&tq 0xffffff00012271c0
[spa_zio_intr_0]
142 0 0 0 SL zfs:(&tq 0xffffff0001227300
[spa_zio_issue_0]
141 0 0 0 SL zfs:(&tq 0xffffff0001227300
[spa_zio_issue_0]
87 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[1]
mirror/swa]
86 0 0 0 SL vmwait 0xffffffff80a9a79c [g_eli[0]
mirror/swa]
53 0 0 0 SL sdflush 0xffffffff80a99d88 [softdepflush]
52 0 0 0 SL vlruwt 0xffffff0001448000 [vnlru]
51 0 0 0 SL zfs:&vq- 0xffffff00015d0c88 [syncer]
50 0 0 0 SL psleep 0xffffffff80a8a55c [bufdaemon]
49 0 0 0 SL pgzero 0xffffffff80a9b804 [pagezero]
48 0 0 0 SL psleep 0xffffffff80a9ab48 [vmdaemon]
47 0 0 0 SL wswbuf0 0xffffffff80a9a004 [pagedaemon]
46 0 0 0 SL m:w1 0xffffff0001401200 [g_mirror swap]
45 0 0 0 SL m:w1 0xffffff00013c3800 [g_mirror root]
44 0 0 0 SL zfs:(&ar 0xffffffff80c746b0
[arc_reclaim_thread]
43 0 0 0 SL waiting_ 0xffffffff80a8dc88 [sctp_iterator]
42 0 0 0 WL [swi0: sio]
41 0 0 0 WL [irq1: atkbd0]
40 0 0 0 WL [irq15: ata1]
39 0 0 0 WL [irq14: ata0]
38 0 0 0 SL usbevt 0xffffff000133a420 [usb5]
37 0 0 0 SL usbevt 0xffffffff81065420 [usb4]
36 0 0 0 SL usbevt 0xffffffff81063420 [usb3]
35 0 0 0 SL usbevt 0xffffff000130c420 [usb2]
34 0 0 0 WL [irq22: ehci0]
33 0 0 0 SL usbevt 0xffffffff81061420 [usb1]
32 0 0 0 WL [irq21:
pcm0 uhci1+]
31 0 0 0 SL usbtsk 0xffffffff80a71028 [usbtask-dr]
30 0 0 0 SL usbtsk 0xffffffff80a71000 [usbtask-hc]
29 0 0 0 SL usbevt 0xffffffff8105f420 [usb0]
28 0 0 0 WL [irq20:
uhci0 uhci+]
27 0 0 0 SL - 0xffffff00012ef880 [em0 taskq]
26 0 0 0 WL [irq9: acpi0]
25 0 0 0 SL - 0xffffff0001294580 [kqueue taskq]
24 0 0 0 WL [swi6: task
queue]
23 0 0 0 RL CPU 1 [swi6:
Giant taskq]
22 0 0 0 SL - 0xffffff000122c500 [thread taskq]
21 0 0 0 WL [swi5: +]
20 0 0 0 SL - 0xffffff000122ca80 [acpi_task_2]
19 0 0 0 SL - 0xffffff000122ca80 [acpi_task_1]
18 0 0 0 SL - 0xffffff000122ca80 [acpi_task_0]
17 0 0 0 WL [swi2: cambio]
9 0 0 0 SL ccb_scan 0xffffffff80a3fda0 [xpt_thrd]
16 0 0 0 SL - 0xffffffff80a74ea8 [yarrow]
8 0 0 0 SL crypto_r 0xffffffff80d293b0 [crypto
returns]
7 0 0 0 SL crypto_w 0xffffffff80d29350 [crypto]
6 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq]
5 0 0 0 SL zfs:(&tq 0xffffff0001227080 [system_taskq]
4 0 0 0 SL - 0xffffffff80a71838 [g_down]
3 0 0 0 SL - 0xffffffff80a71830 [g_up]
2 0 0 0 SL - 0xffffffff80a71820 [g_event]
15 0 0 0 WL [swi1: net]
14 0 0 0 WL [swi3: vm]
13 0 0 0 LL *Giant 0xffffff00015a2be0 [swi4:
clock sio]
12 0 0 0 RL CPU 0 [idle: cpu0]
11 0 0 0 RL [idle: cpu1]
1 0 1 0 SLs wait 0xffffff000112f8d0 [init]
10 0 0 0 SL audit_wo 0xffffffff80a99260 [audit]
0 0 0 0 SLs vmwait 0xffffffff80a9a79c [swapper]
db> trace 77872
Tracing pid 77872 tid 100157 td 0xffffff000eb9c350
sched_switch() at sched_switch+0x1fe
mi_switch() at mi_switch+0x189
sleepq_wait() at sleepq_wait+0x3b
_cv_wait() at _cv_wait+0xfe
zio_wait() at zio_wait+0x5f
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x1f6
dmu_buf_hold_array() at dmu_buf_hold_array+0x62
dmu_read_uio() at dmu_read_uio+0x3f
zfs_freebsd_read() at zfs_freebsd_read+0x535
vn_read() at vn_read+0x1ca
dofileread() at dofileread+0xa1
kern_readv() at kern_readv+0x4c
read() at read+0x54
syscall() at syscall+0x254
Xfast_syscall() at Xfast_syscall+0xab
--- syscall (3, FreeBSD ELF64, read), rip = 0x8bd74c, rsp =
0x7fffffffe2c8, rbp = 0 ---
db> trace 51
Tracing pid 51 tid 100050 td 0xffffff00014246a0
sched_switch() at sched_switch+0x1fe
mi_switch() at mi_switch+0x189
sleepq_wait() at sleepq_wait+0x3b
_sx_xlock_hard() at _sx_xlock_hard+0x1ee
_sx_xlock() at _sx_xlock+0x4e
vdev_queue_io() at vdev_queue_io+0x74
vdev_geom_io_start() at vdev_geom_io_start+0x4a
vdev_mirror_io_start() at vdev_mirror_io_start+0x1b0
zil_lwb_write_start() at zil_lwb_write_start+0x2f1
zil_commit_writer() at zil_commit_writer+0x1c4
zil_commit() at zil_commit+0xb8
zfs_sync() at zfs_sync+0x9a
sync_fsync() at sync_fsync+0x1ac
sched_sync() at sched_sync+0x63f
fork_exit() at fork_exit+0x11f
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb502fd30, rbp = 0 ---
Any ideas about this?
More information about the freebsd-fs
mailing list