ZFS HBAs + LSI chip sets (Was: ZFS hang (system #2))
Dennis Glatting
freebsd at pki2.com
Mon Oct 22 23:19:20 UTC 2012
On Mon, 2012-10-22 at 17:15 +0200, Attila Nagy wrote:
> Hi,
>
> On 10/21/2012 02:10 AM, Dennis Glatting wrote:
> > I chosen the LSI2008 chip set because the code was donated by LSI, and
> > they therefore demonstrated interest in supporting their products under
> > FreeBSD, and that chip set is found in a lot of places, notably
> > Supermicro boards. Additionally, there were stories of success on the
> > lists for several boards. That said, I have received private email from
> > others expressing frustration with ZFS and the "hang" problems, which I
> > believe are also the LSI chips.
> >
> I have a Sun X4540, which shows similar symptoms. It has some (6)
> on-board LSI 1068E SAS controllers with 1.27.02.00-IT firmware (latest
> from Sun/Oracle) and 48 SATA disks.
> It runs stable/9 at r240134.
>
> Currently the machine does a resilver on its 48 disk pool (heavy IO
> happens), which stops periodically.
> I've set up watchdogd with a command of "ls /data" (the pool is mounted
> there). It doesn't restart the machine when the IO freezes, because the
> command always succeeds (coming from cache I guess).
> But if something wants to touch the disks, it stucks in D state.
>
> zpool status shows:
> scan: resilver in progress since Sun Oct 21 15:40:50 2012
> 3.16T scanned out of 13.8T at 26.4M/s, 117h45m to go
> 133G resilvered, 22.82% done
> And the estimated time grows constantly.
> gstat shows no IO.
>
I've had this problem too.
> If I issue an ls -R /data, it gets stuck:
> root 36217 0.0 0.0 14380 1800 3 D+ 4:45PM 0:00.00 ls -R /data/
> # procstat -k 36217
> PID TID COMM TDNAME KSTACK
> 36217 101469 ls - mi_switch sleepq_wait
> _cv_wait zio_wait dbuf_read dbuf_findbp dbuf_hold_impl dbuf_hold
> dmu_buf_hold zap_lockdir zap_cursor_retrieve zfs_freebsd_readdir
> kern_getdirentries sys_getdirentries amd64_syscall Xfast_syscall
>
> Also, a dd on any of the disks waits forever, without reading a single byte:
> root 36570 0.0 0.0 9876 1356 4 DL+ 4:46PM 0:00.00 dd
> if=/dev/da0 of=/dev/null
> # procstat -k 36570
> PID TID COMM TDNAME KSTACK
> 36570 101489 dd - mi_switch sleepq_wait
> _sleep bwait physio devfs_read_f dofileread kern_readv sys_read
> amd64_syscall Xfast_syscall
>
>
> Camcontrol works:
>
> # camcontrol devlist
> <ATA SEAGATE ST35002N SU0F> at scbus0 target 0 lun 0 (pass0,da0)
> <ATA SEAGATE ST35002N SU0F> at scbus0 target 1 lun 0 (pass1,da1)
> <ATA SEAGATE ST35002N SU0F> at scbus0 target 2 lun 0 (pass2,da2)
> <ATA HITACHI HDS7250S AJ0A> at scbus0 target 3 lun 0 (pass3,da3)
> <ATA SEAGATE ST35002N SU0F> at scbus0 target 4 lun 0 (pass4,da4)
> <ATA HITACHI HUA7250S AC5A> at scbus0 target 5 lun 0 (pass5,da5)
> <ATA SEAGATE ST35002N SU0F> at scbus0 target 6 lun 0 (pass6,da6)
> <ATA ST3500320NS SN04> at scbus0 target 7 lun 0 (pass7,da7)
> <ATA HITACHI HDS7250S AJ0A> at scbus1 target 0 lun 0 (pass8,da8)
> <ATA SEAGATE ST35002N SU0F> at scbus1 target 1 lun 0 (pass9,da9)
> <ATA SEAGATE ST35002N SU0F> at scbus1 target 2 lun 0 (pass10,da10)
> <ATA SEAGATE ST35002N SU0F> at scbus1 target 3 lun 0 (pass11,da11)
> <ATA SEAGATE ST35002N SU0F> at scbus1 target 4 lun 0 (pass12,da12)
> <ATA SEAGATE ST35002N SU0F> at scbus1 target 5 lun 0 (pass13,da13)
> <ATA SEAGATE ST35002N SU0F> at scbus1 target 6 lun 0 (pass14,da14)
> <ATA SEAGATE ST35002N SU0F> at scbus1 target 7 lun 0 (pass15,da15)
> <ATA SEAGATE ST35002N SU0F> at scbus2 target 0 lun 0 (pass16,da16)
> <ATA SEAGATE ST35002N SU0F> at scbus2 target 1 lun 0 (pass17,da17)
> <ATA HITACHI HUA7250S AC5A> at scbus2 target 2 lun 0 (pass18,da18)
> <ATA SEAGATE ST35002N SU0F> at scbus2 target 3 lun 0 (pass19,da19)
> <ATA SEAGATE ST35002N SU0F> at scbus2 target 4 lun 0 (pass20,da20)
> <ATA SEAGATE ST35002N SU0F> at scbus2 target 5 lun 0 (pass21,da21)
> <ATA SEAGATE ST35002N SU0F> at scbus2 target 6 lun 0 (pass22,da22)
> <ATA SEAGATE ST35002N SU0F> at scbus2 target 7 lun 0 (pass23,da23)
> <ATA SEAGATE ST35002N SU0F> at scbus3 target 0 lun 0 (pass24,da24)
> <ATA SEAGATE ST35002N SU0F> at scbus3 target 1 lun 0 (pass25,da25)
> <ATA SEAGATE ST35002N SU0F> at scbus3 target 2 lun 0 (pass26,da26)
> <ATA SEAGATE ST35002N SU0F> at scbus3 target 3 lun 0 (pass27,da27)
> <ATA SEAGATE ST35002N SU0F> at scbus3 target 4 lun 0 (pass28,da28)
> <ATA SEAGATE ST35002N SU0F> at scbus3 target 5 lun 0 (pass29,da29)
> <ATA SEAGATE ST35002N SU0F> at scbus3 target 6 lun 0 (pass30,da30)
> <ATA SEAGATE ST35002N SU0F> at scbus3 target 7 lun 0 (pass31,da31)
> <ATA SEAGATE ST35002N SU0F> at scbus4 target 0 lun 0 (pass32,da32)
> <ATA SEAGATE ST35002N SU0F> at scbus4 target 1 lun 0 (pass33,da33)
> <ATA SEAGATE ST35002N SU0F> at scbus4 target 2 lun 0 (pass34,da34)
> <ATA SEAGATE ST35002N SU0F> at scbus4 target 3 lun 0 (pass35,da35)
> <ATA SEAGATE ST35002N SU0F> at scbus4 target 4 lun 0 (pass36,da36)
> <ATA SEAGATE ST35002N SU0F> at scbus4 target 5 lun 0 (pass37,da37)
> <ATA SEAGATE ST35002N SU0F> at scbus4 target 6 lun 0 (pass38,da38)
> <ATA SEAGATE ST35002N SU0F> at scbus4 target 7 lun 0 (pass39,da39)
> <ATA SEAGATE ST35002N SU0F> at scbus5 target 0 lun 0 (pass40,da40)
> <ATA SEAGATE ST35002N SU0F> at scbus5 target 1 lun 0 (pass41,da41)
> <ATA SEAGATE ST35002N SU0F> at scbus5 target 2 lun 0 (pass42,da42)
> <ATA SEAGATE ST35002N SU0F> at scbus5 target 3 lun 0 (pass43,da43)
> <ATA SEAGATE ST35002N SU0F> at scbus5 target 4 lun 0 (pass44,da44)
> <ATA SEAGATE ST35002N SU0F> at scbus5 target 5 lun 0 (pass45,da45)
> <ATA SEAGATE ST35002N SU0F> at scbus5 target 6 lun 0 (pass46,da46)
> <ATA SEAGATE ST35002N SU0F> at scbus5 target 7 lun 0 (pass47,da47)
>
> # camcontrol tags da0
> (pass0:mpt0:0:0:0): device openings: 255
>
> Also works (I guess it doesn't touch the disks):
> # zfs list
> NAME USED AVAIL REFER MOUNTPOINT
> logpool 13.1T 7.17T 507K /data
> logpool/jail 7.08G 7.17T 7.08G /data/jail
> logpool/logs 13.1T 7.17T 3.40T /data/jail/logvm/logs
> logpool/logs/OTHER 9.24T 7.17T 2.36T /data/jail/logvm/logs/OTHER
>
> But this doesn't:
> root 36686 0.0 0.0 33384 2512 5 D+ 4:49PM 0:00.00 zfs list
> -t snapshot
> # procstat -k 36686
> PID TID COMM TDNAME KSTACK
> 36686 101593 zfs - mi_switch sleepq_wait
> _cv_wait zio_wait dbuf_read dmu_buf_hold zap_lockdir zap_cursor_retrieve
> dmu_snapshot_list_next zfs_ioc_snapshot_list_next zfsdev_ioctl
> devfs_ioctl_f kern_ioctl sys_ioctl amd64_syscall Xfast_syscall
>
> Entering into the debugger:
> KDB: enter: sysctl debug.kdb.enter
> [ thread pid 36959 tid 101484 ]
> Stopped at kdb_enter+0x3b: movq $0,0x95ab72(%rip)
> db> ps
> pid ppid pgrp uid state wmesg wchan cmd
> 36959 1769 36959 0 R+ CPU 0 sysctl
> 36691 919 919 0 S sbwait 0xfffffe009d752144 perl
> 36686 36677 36686 0 D+ zio->io_ 0xfffffe001ccb7d70 zfs
> 36677 36208 36677 0 Ss+ pause 0xfffffe009d0030a0 csh
> 36570 36567 36570 0 DL+ physrd 0xffffff87005a2980 dd
> 36567 36208 36567 0 Ss+ pause 0xfffffe00115c4540 csh
> 36217 36209 36217 0 D+ zio->io_ 0xfffffe001c2b2320 ls
> 36209 36208 36209 0 Ss+ pause 0xfffffe022c8aa0a0 csh
> 36208 36207 36208 0 Ss select 0xfffffe0665c92e40 screen
> 36207 1782 36207 0 S+ pause 0xfffffe009d0010a0 screen
> 32921 883 873 0 DL cbwait 0xfffffe000f7f7848 camcontrol
> 1782 1780 1782 0 Ss+ pause 0xfffffe009d4559e0 csh
> 1780 897 1780 0 Ss select 0xfffffe001d546740 sshd
> 1776 1774 1776 0 Ss+ ttyin 0xfffffe001c02a4a8 csh
> 1774 897 1774 0 Ss select 0xfffffe001cb4d0c0 sshd
> 1769 1767 1769 0 Ss+ pause 0xfffffe001191a540 csh
> 1767 897 1767 0 Ss select 0xfffffe000fd72bc0 sshd
> 1079 1 1079 0 Ss+ ttyin 0xfffffe000c82c4a8 getty
> 1078 1 1078 0 Ss+ ttyin 0xfffffe000c82c8a8 getty
> 1077 1 1077 0 Ss+ ttyin 0xfffffe000c82cca8 getty
> 1076 1 1076 0 Ss+ ttyin 0xfffffe000c82d0a8 getty
> 1075 1 1075 0 Ss+ ttyin 0xfffffe000c82d4a8 getty
> 1074 1 1074 0 Ss+ ttyin 0xfffffe000c82d8a8 getty
> 1073 1 1073 0 Ss+ ttyin 0xfffffe000c82dca8 getty
> 1072 1 1072 0 Ss+ ttyin 0xfffffe000c82f0a8 getty
> 919 1 919 0 Ss select 0xfffffe000f5ac940 perl
> 907 1 907 0 Ss nanslp 0xffffffff81244f08 cron
> 903 1 903 25 Ss pause 0xfffffe001125e0a0 sendmail
> 900 1 900 0 Ss select 0xfffffe001d549340 sendmail
> 897 1 897 0 Ss select 0xfffffe001d546cc0 sshd
> 892 884 873 0 S piperd 0xfffffe001e940888 fghack
> 884 878 873 0 S wait 0xfffffe000fdee000 sh
> 883 879 873 0 S piperd 0xfffffe022c08b000 perl
> 879 875 873 0 S select 0xfffffe001ca6a8c0 supervise
> 878 875 873 0 S select 0xfffffe000fd73d40 supervise
> 876 1 873 0 S piperd 0xfffffe001e9c5b60 readproctitle
> 875 1 873 0 S nanslp 0xffffffff81244f08 svscan
> 870 868 867 123 S select 0xfffffe000fd934c0 ntpd
> 868 867 867 123 S select 0xfffffe001ca68e40 ntpd
> 867 1 867 0 Ss select 0xfffffe000fddd740 ntpd
> 796 0 0 0 DL mdwait 0xfffffe000f52a000 [md2]
> 774 1 774 53 Ss (threaded) named
> 101524 S kqread 0xfffffe00115dd100 named
> 101523 S uwait 0xfffffe000fde5200 named
> 101522 S uwait 0xfffffe00110ce680 named
> 101521 S uwait 0xfffffe000fda0300 named
> 101520 S uwait 0xfffffe000fddd380 named
> 101519 S uwait 0xfffffe001198ca00 named
> 101518 S uwait 0xfffffe000fd58880 named
> 101517 S uwait 0xfffffe000fd7ab80 named
> 101516 S uwait 0xfffffe000f80e480 named
> 101515 S uwait 0xfffffe000f80f400 named
> 101501 S sigwait 0xfffffe00110dd000 named
> 751 750 751 0 Ss select 0xfffffe001d549440 syslog-ng
> 750 1 749 0 S wait 0xfffffe000c8144a0 syslog-ng
> 612 608 608 64 S bpf 0xfffffe001ca94800 pflogd
> 608 1 608 0 Ss sbwait 0xfffffe001eb4ae8c pflogd
> 605 0 0 0 DL pftm 0xffffffff817547a0 [pfpurge]
> 78 0 0 0 DL (threaded) [zfskern]
> 101459 D spa->spa 0xfffffe0011462680
> [txg_thread_enter]
> 101458 D tx->tx_q 0xfffffe001b199230
> [txg_thread_enter]
> 100122 D l2arc_fe 0xffffffff8173ebc0
> [l2arc_feed_thread]
> 100121 D arc_recl 0xffffffff8172ed20
> [arc_reclaim_thread]
> 59 0 0 0 DL mdwait 0xfffffe000f521000 [md1]
> 47 0 0 0 DL mdwait 0xfffffe000f523800 [md0]
> 24 0 0 0 DL sdflush 0xffffffff812a6158 [softdepflush]
> 23 0 0 0 DL syncer 0xffffffff812928c0 [syncer]
> 22 0 0 0 DL vlruwt 0xfffffe000c80d000 [vnlru]
> 21 0 0 0 DL psleep 0xffffffff81292348 [bufdaemon]
> 20 0 0 0 DL pgzero 0xffffffff812b019c [pagezero]
> 19 0 0 0 DL psleep 0xffffffff812af368 [vmdaemon]
> 18 0 0 0 DL psleep 0xffffffff812af32c [pagedaemon]
> 17 0 0 0 DL ccb_scan 0xffffffff811ff260 [xpt_thrd]
> 16 0 0 0 DL idle 0xffffff8001df3000 [mpt_recovery5]
> 9 0 0 0 DL idle 0xffffff8001dde000 [mpt_recovery4]
> 8 0 0 0 DL idle 0xffffff8001dc9000 [mpt_recovery3]
> 7 0 0 0 DL idle 0xffffff8001daa000 [mpt_recovery2]
> 6 0 0 0 DL idle 0xffffff8001d95000 [mpt_recovery1]
> 5 0 0 0 DL idle 0xffffff8001d80000 [mpt_recovery0]
> 15 0 0 0 DL (threaded) [usb]
> 100048 D - 0xffffff8001d73e18 [usbus1]
> 100047 D - 0xffffff8001d73dc0 [usbus1]
> 100046 D - 0xffffff8001d73d68 [usbus1]
> 100045 D - 0xffffff8001d73d10 [usbus1]
> 100043 D - 0xffffff8001d6b460 [usbus0]
> 100042 D - 0xffffff8001d6b408 [usbus0]
> 100041 D - 0xffffff8001d6b3b0 [usbus0]
> 100040 D - 0xffffff8001d6b358 [usbus0]
> 4 0 0 0 DL ctl_work 0xffffff8000a41000 [ctl_thrd]
> 14 0 0 0 DL - 0xffffffff81243ba4 [yarrow]
> 3 0 0 0 DL crypto_r 0xffffffff812a4ae0 [crypto
> returns]
> 2 0 0 0 DL crypto_w 0xffffffff812a4aa0 [crypto]
> 13 0 0 0 DL (threaded) [geom]
> 100023 D - 0xffffffff8123d030 [g_down]
> 100022 D - 0xffffffff8123d028 [g_up]
> 100021 D - 0xffffffff8123d018 [g_event]
> 12 0 0 0 RL (threaded) [intr]
> 100065 I [swi0: uart]
> 100063 I [irq293: mpt5]
> 100061 I [irq292: mpt4]
> 100059 I [irq291: mpt3]
> 100055 I [irq274: mpt2]
> 100053 I [irq273: mpt1]
> 100051 I [irq272: mpt0]
> 100044 I [irq22: ehci0]
> 100039 I [irq21: ohci0]
> 100034 I [swi2: cambio]
> 100031 I [swi6: task queue]
> 100030 I [swi6: Giant taskq]
> 100028 I [swi5: +]
> 100020 I [swi1: netisr 0]
> 100019 I [swi4: clock]
> 100018 I [swi4: clock]
> 100017 I [swi4: clock]
> 100016 I [swi4: clock]
> 100015 I [swi4: clock]
> 100014 I [swi4: clock]
> 100013 I [swi4: clock]
> 100012 RunQ [swi4: clock]
> 100011 I [swi3: vm]
> 11 0 0 0 RL (threaded) [idle]
> 100010 Run CPU 7 [idle: cpu7]
> 100009 Run CPU 6 [idle: cpu6]
> 100008 Run CPU 5 [idle: cpu5]
> 100007 Run CPU 4 [idle: cpu4]
> 100006 Run CPU 3 [idle: cpu3]
> 100005 Run CPU 2 [idle: cpu2]
> 100004 Run CPU 1 [idle: cpu1]
> 100003 CanRun [idle: cpu0]
> 1 0 1 0 SLs wait 0xfffffe000c068940 [init]
> 10 0 0 0 DL audit_wo 0xffffffff812a50d0 [audit]
> 0 0 0 0 DLs (threaded) [kernel]
> 101463 D - 0xfffffe000fddab00 [zil_clean]
> 101462 D - 0xfffffe000fd6a800 [zil_clean]
> 101461 D - 0xfffffe000fdf6180 [zil_clean]
> 101460 D - 0xfffffe001d546600 [zil_clean]
> 101457 D - 0xfffffe000f359e00 [zfs_vn_rele_taskq]
> 101456 D - 0xfffffe001198d080 [zio_ioctl_intr]
> 101455 D - 0xfffffe001cb4fa80 [zio_ioctl_issue]
> 101454 D - 0xfffffe000ffbf380 [zio_claim_intr]
> 101453 D - 0xfffffe00110cf580 [zio_claim_issue]
> 101452 D - 0xfffffe00110cf880 [zio_free_intr]
> 101451 D - 0xfffffe000ffc1b80 [zio_free_issue_99]
> 101450 D - 0xfffffe000ffc1b80 [zio_free_issue_98]
> 101449 D - 0xfffffe000ffc1b80 [zio_free_issue_97]
> 101448 D - 0xfffffe000ffc1b80 [zio_free_issue_96]
> 101447 D - 0xfffffe000ffc1b80 [zio_free_issue_95]
> 101446 D - 0xfffffe000ffc1b80 [zio_free_issue_94]
> 101445 D - 0xfffffe000ffc1b80 [zio_free_issue_93]
> 101444 D - 0xfffffe000ffc1b80 [zio_free_issue_92]
> 101443 D - 0xfffffe000ffc1b80 [zio_free_issue_91]
> 101442 D - 0xfffffe000ffc1b80 [zio_free_issue_90]
> 101441 D - 0xfffffe000ffc1b80 [zio_free_issue_89]
> 101440 D - 0xfffffe000ffc1b80 [zio_free_issue_88]
> 101439 D - 0xfffffe000ffc1b80 [zio_free_issue_87]
> 101438 D - 0xfffffe000ffc1b80 [zio_free_issue_86]
> 101437 D - 0xfffffe000ffc1b80 [zio_free_issue_85]
> 101436 D - 0xfffffe000ffc1b80 [zio_free_issue_84]
> 101435 D - 0xfffffe000ffc1b80 [zio_free_issue_83]
> 101434 D - 0xfffffe000ffc1b80 [zio_free_issue_82]
> 101433 D - 0xfffffe000ffc1b80 [zio_free_issue_81]
> 101432 D - 0xfffffe000ffc1b80 [zio_free_issue_80]
> 101431 D - 0xfffffe000ffc1b80 [zio_free_issue_79]
> 101430 D - 0xfffffe000ffc1b80 [zio_free_issue_78]
> 101429 D - 0xfffffe000ffc1b80 [zio_free_issue_77]
> 101428 D - 0xfffffe000ffc1b80 [zio_free_issue_76]
> 101427 D - 0xfffffe000ffc1b80 [zio_free_issue_75]
> 101426 D - 0xfffffe000ffc1b80 [zio_free_issue_74]
> 101425 D - 0xfffffe000ffc1b80 [zio_free_issue_73]
> 101424 D - 0xfffffe000ffc1b80 [zio_free_issue_72]
> 101423 D - 0xfffffe000ffc1b80 [zio_free_issue_71]
> 101422 D - 0xfffffe000ffc1b80 [zio_free_issue_70]
> 101421 D - 0xfffffe000ffc1b80 [zio_free_issue_69]
> 101420 D - 0xfffffe000ffc1b80 [zio_free_issue_68]
> 101419 D - 0xfffffe000ffc1b80 [zio_free_issue_67]
> 101418 D - 0xfffffe000ffc1b80 [zio_free_issue_66]
> 101417 D - 0xfffffe000ffc1b80 [zio_free_issue_65]
> 101416 D - 0xfffffe000ffc1b80 [zio_free_issue_64]
> 101415 D - 0xfffffe000ffc1b80 [zio_free_issue_63]
> 101414 D - 0xfffffe000ffc1b80 [zio_free_issue_62]
> 101413 D - 0xfffffe000ffc1b80 [zio_free_issue_61]
> 101412 D - 0xfffffe000ffc1b80 [zio_free_issue_60]
> 101411 D - 0xfffffe000ffc1b80 [zio_free_issue_59]
> 101410 D - 0xfffffe000ffc1b80 [zio_free_issue_58]
> 101409 D - 0xfffffe000ffc1b80 [zio_free_issue_57]
> 101408 D - 0xfffffe000ffc1b80 [zio_free_issue_56]
> 101407 D - 0xfffffe000ffc1b80 [zio_free_issue_55]
> 101406 D - 0xfffffe000ffc1b80 [zio_free_issue_54]
> 101405 D - 0xfffffe000ffc1b80 [zio_free_issue_53]
> 101404 D - 0xfffffe000ffc1b80 [zio_free_issue_52]
> 101403 D - 0xfffffe000ffc1b80 [zio_free_issue_51]
> 101402 D - 0xfffffe000ffc1b80 [zio_free_issue_50]
> 101401 D - 0xfffffe000ffc1b80 [zio_free_issue_49]
> 101400 D - 0xfffffe000ffc1b80 [zio_free_issue_48]
> 101399 D - 0xfffffe000ffc1b80 [zio_free_issue_47]
> 101398 D - 0xfffffe000ffc1b80 [zio_free_issue_46]
> 101397 D - 0xfffffe000ffc1b80 [zio_free_issue_45]
> 101396 D - 0xfffffe000ffc1b80 [zio_free_issue_44]
> 101395 D - 0xfffffe000ffc1b80 [zio_free_issue_43]
> 101394 D - 0xfffffe000ffc1b80 [zio_free_issue_42]
> 101393 D - 0xfffffe000ffc1b80 [zio_free_issue_41]
> 101392 D - 0xfffffe000ffc1b80 [zio_free_issue_40]
> 101391 D - 0xfffffe000ffc1b80 [zio_free_issue_39]
> 101390 D - 0xfffffe000ffc1b80 [zio_free_issue_38]
> 101389 D - 0xfffffe000ffc1b80 [zio_free_issue_37]
> 101388 D - 0xfffffe000ffc1b80 [zio_free_issue_36]
> 101387 D - 0xfffffe000ffc1b80 [zio_free_issue_35]
> 101386 D - 0xfffffe000ffc1b80 [zio_free_issue_34]
> 101385 D - 0xfffffe000ffc1b80 [zio_free_issue_33]
> 101384 D - 0xfffffe000ffc1b80 [zio_free_issue_32]
> 101383 D - 0xfffffe000ffc1b80 [zio_free_issue_31]
> 100569 D - 0xfffffe000ffc1b80 [zio_free_issue_30]
> 100567 D - 0xfffffe000ffc1b80 [zio_free_issue_29]
> 100565 D - 0xfffffe000ffc1b80 [zio_free_issue_28]
> 100560 D - 0xfffffe000ffc1b80 [zio_free_issue_27]
> 100554 D - 0xfffffe000ffc1b80 [zio_free_issue_26]
> 100553 D - 0xfffffe000ffc1b80 [zio_free_issue_25]
> 100547 D - 0xfffffe000ffc1b80 [zio_free_issue_24]
> 100545 D - 0xfffffe000ffc1b80 [zio_free_issue_23]
> 100542 D - 0xfffffe000ffc1b80 [zio_free_issue_22]
> 100539 D - 0xfffffe000ffc1b80 [zio_free_issue_21]
> 100536 D - 0xfffffe000ffc1b80 [zio_free_issue_20]
> 100530 D - 0xfffffe000ffc1b80 [zio_free_issue_19]
> 100487 D - 0xfffffe000ffc1b80 [zio_free_issue_18]
> 100415 D - 0xfffffe000ffc1b80 [zio_free_issue_17]
> 100413 D - 0xfffffe000ffc1b80 [zio_free_issue_16]
> 100407 D - 0xfffffe000ffc1b80 [zio_free_issue_15]
> 100403 D - 0xfffffe000ffc1b80 [zio_free_issue_14]
> 100400 D - 0xfffffe000ffc1b80 [zio_free_issue_13]
> 100393 D - 0xfffffe000ffc1b80 [zio_free_issue_12]
> 100391 D - 0xfffffe000ffc1b80 [zio_free_issue_11]
> 100387 D - 0xfffffe000ffc1b80 [zio_free_issue_10]
> 100386 D - 0xfffffe000ffc1b80 [zio_free_issue_9]
> 100385 D - 0xfffffe000ffc1b80 [zio_free_issue_8]
> 100384 D - 0xfffffe000ffc1b80 [zio_free_issue_7]
> 100383 D - 0xfffffe000ffc1b80 [zio_free_issue_6]
> 100379 D - 0xfffffe000ffc1b80 [zio_free_issue_5]
> 100372 D - 0xfffffe000ffc1b80 [zio_free_issue_4]
> 100367 D - 0xfffffe000ffc1b80 [zio_free_issue_3]
> 100366 D - 0xfffffe000ffc1b80 [zio_free_issue_2]
> 100361 D - 0xfffffe000ffc1b80 [zio_free_issue_1]
> 100360 D - 0xfffffe000ffc1b80 [zio_free_issue_0]
> 100359 D - 0xfffffe001ca67280 [zio_write_intr_high]
> 100358 D - 0xfffffe001ca67280 [zio_write_intr_high]
> 100357 D - 0xfffffe001ca67280 [zio_write_intr_high]
> 100354 D - 0xfffffe001ca67280 [zio_write_intr_high]
> 100353 D - 0xfffffe001ca67280 [zio_write_intr_high]
> 100349 D - 0xfffffe000fd72700 [zio_write_intr_7]
> 100348 D - 0xfffffe000fd72700 [zio_write_intr_6]
> 100345 D - 0xfffffe000fd72700 [zio_write_intr_5]
> 100343 D - 0xfffffe000fd72700 [zio_write_intr_4]
> 100342 D - 0xfffffe000fd72700 [zio_write_intr_3]
> 100341 D - 0xfffffe000fd72700 [zio_write_intr_2]
> 100340 D - 0xfffffe000fd72700 [zio_write_intr_1]
> 100339 D - 0xfffffe000fd72700 [zio_write_intr_0]
> 100337 D - 0xfffffe001196ce00 [zio_write_issue_hig]
> 100336 D - 0xfffffe001196ce00 [zio_write_issue_hig]
> 100334 D - 0xfffffe001196ce00 [zio_write_issue_hig]
> 100330 D - 0xfffffe001196ce00 [zio_write_issue_hig]
> 100327 D - 0xfffffe001196ce00 [zio_write_issue_hig]
> 100324 D - 0xfffffe00110cfb00 [zio_write_issue_7]
> 100322 D - 0xfffffe00110cfb00 [zio_write_issue_6]
> 100321 D - 0xfffffe00110cfb00 [zio_write_issue_5]
> 100316 D - 0xfffffe00110cfb00 [zio_write_issue_4]
> 100314 D - 0xfffffe00110cfb00 [zio_write_issue_3]
> 100312 D - 0xfffffe00110cfb00 [zio_write_issue_2]
> 100311 D - 0xfffffe00110cfb00 [zio_write_issue_1]
> 100307 D - 0xfffffe00110cfb00 [zio_write_issue_0]
> 100306 D - 0xfffffe000ffbfc80 [zio_read_intr_7]
> 100305 D - 0xfffffe000ffbfc80 [zio_read_intr_6]
> 100303 D - 0xfffffe000ffbfc80 [zio_read_intr_5]
> 100300 D - 0xfffffe000ffbfc80 [zio_read_intr_4]
> 100298 D - 0xfffffe000ffbfc80 [zio_read_intr_3]
> 100297 D - 0xfffffe000ffbfc80 [zio_read_intr_2]
> 100293 D - 0xfffffe000ffbfc80 [zio_read_intr_1]
> 100292 D - 0xfffffe000ffbfc80 [zio_read_intr_0]
> 100291 D - 0xfffffe00110cf000 [zio_read_issue_7]
> 100289 D - 0xfffffe00110cf000 [zio_read_issue_6]
> 100288 D - 0xfffffe00110cf000 [zio_read_issue_5]
> 100286 D - 0xfffffe00110cf000 [zio_read_issue_4]
> 100282 D - 0xfffffe00110cf000 [zio_read_issue_3]
> 100281 D - 0xfffffe00110cf000 [zio_read_issue_2]
> 100280 D - 0xfffffe00110cf000 [zio_read_issue_1]
> 100278 D - 0xfffffe00110cf000 [zio_read_issue_0]
> 100275 D - 0xfffffe001113b500 [zio_null_intr]
> 100273 D - 0xfffffe001196c800 [zio_null_issue]
> 100120 D - 0xfffffe0011370300 [system_taskq_7]
> 100119 D - 0xfffffe0011370300 [system_taskq_6]
> 100118 D - 0xfffffe0011370300 [system_taskq_5]
> 100117 D - 0xfffffe0011370300 [system_taskq_4]
> 100116 D - 0xfffffe0011370300 [system_taskq_3]
> 100115 D - 0xfffffe0011370300 [system_taskq_2]
> 100114 D - 0xfffffe0011370300 [system_taskq_1]
> 100113 D - 0xfffffe0011370300 [system_taskq_0]
> 100066 D - 0xfffffe000f239a80 [mca taskq]
> 100058 D - 0xfffffe000c69b900 [nfe3 taskq]
> 100057 D - 0xfffffe000c698480 [nfe2 taskq]
> 100050 D - 0xfffffe000c620400 [nfe1 taskq]
> 100049 D - 0xfffffe000c61b500 [nfe0 taskq]
> 100037 D - 0xfffffe000c24bb00 [acpi_task_2]
> 100036 D - 0xfffffe000c24bb00 [acpi_task_1]
> 100035 D - 0xfffffe000c24bb00 [acpi_task_0]
> 100033 D - 0xfffffe000c24be00 [kqueue taskq]
> 100032 D - 0xfffffe000c24c000 [ffs_trim taskq]
> 100029 D - 0xfffffe000c20c780 [thread taskq]
> 100024 D - 0xfffffe000c07fb80 [firmware taskq]
> 100000 D sched 0xffffffff8123d280 [swapper]
> 895 892 873 0 Z perl
>
> Setting this:
> # sysctl dev.mpt.0.debug=255
> and doing a dd again from a disk on that controller prints this onto the
> console:
> SCSI IO Request @ 0xffffff80003046f0
> Chain Offset 0x00
> MsgFlags 0x00
> MsgContext 0x000201c5
> Bus: 0
> TargetID 0
> SenseBufferLength 32
> LUN: 0x0
> Control 0x02000200 READ ORDEREDQ
> DataLength 0x00000200
> SenseBufAddr 0x0c678be0
> CDB[0:6] 08 00 00 00 01 00
> SE64 0xffffff87ffd33a30: Addr=0x000000070cc08400 FlagsLength=0xd3000200
> 64_BIT_ADDRESSING LAST_ELEMENT END_OF_BUFFER END_OF_LIST
> mpt0: Send Request 453 (c678a00):
> mpt0: 00000000 00002006 000201c5 00000000 00000000 02000200 00000008
> 00000001
> mpt0: 00000000 00000000 00000200 0c678be0 d3000200 0cc08400 00000007
> ffffffff
> mpt0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff
> mpt0: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> ffffffff
> mpt0: enter mpt_intr
> mpt0: Context Reply: 0x000201c5
> mpt0: exit mpt_intr
>
> And dd freezes.
>
> Alltrace from a couple of stuck processes:
> Tracing command dd pid 36971 tid 101570 td 0xfffffe001efce000
> sched_switch() at sched_switch+0x115
> mi_switch() at mi_switch+0x186
> sleepq_wait() at sleepq_wait+0x42
> _sleep() at _sleep+0x379
> bwait() at bwait+0x64
> physio() at physio+0x1c8
> devfs_read_f() at devfs_read_f+0x90
> dofileread() at dofileread+0xa1
> kern_readv() at kern_readv+0x6c
> sys_read() at sys_read+0x64
> amd64_syscall() at amd64_syscall+0x540
> Xfast_syscall() at Xfast_syscall+0xf7
> --- syscall (3, FreeBSD ELF64, sys_read), rip = 0x800916c8c, rsp =
> 0x7fffffffd658, rbp = 0x7fffffffd6b0 ---
>
> Tracing command zfs pid 36686 tid 101593 td 0xfffffe001ecb3900
> sched_switch() at sched_switch+0x115
> mi_switch() at mi_switch+0x186
> sleepq_wait() at sleepq_wait+0x42
> _cv_wait() at _cv_wait+0x112
> zio_wait() at zio_wait+0x61
> dbuf_read() at dbuf_read+0x5e5
> dmu_buf_hold() at dmu_buf_hold+0xe0
> zap_lockdir() at zap_lockdir+0x58
> zap_cursor_retrieve() at zap_cursor_retrieve+0x19b
> dmu_snapshot_list_next() at dmu_snapshot_list_next+0xaf
> zfs_ioc_snapshot_list_next() at zfs_ioc_snapshot_list_next+0x101
> zfsdev_ioctl() at zfsdev_ioctl+0xe6
> devfs_ioctl_f() at devfs_ioctl_f+0x7b
> kern_ioctl() at kern_ioctl+0x106
> sys_ioctl() at sys_ioctl+0xfd
> amd64_syscall() at amd64_syscall+0x540
> Xfast_syscall() at Xfast_syscall+0xf7
> --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x801be2c2c, rsp =
> 0x7fffffff8938, rbp = 0x4000 ---
>
More information about the freebsd-fs
mailing list