zio->io_cv deadlock

Kevin Day toasty at dragondata.com
Tue Mar 17 16:04:59 PDT 2009


I've got a test environment where I can make this deadlock happen  
within 3-4 hours of use now. This is from -CURRENT as of yesterday.


This server isn't trying to use zfs on root, so when it hangs I'm not  
quite as bad off. Here's a ps output:

USER    PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME  
COMMAND            UID  PPID CPU PRI NI MWCHAN  SL  RE PAGEIN   LIM TSIZ
root    477  0.0  0.0  2180   656  ??  Is    9:34AM   0:00.00 /sbin/ 
devd           0     1   0  44  0 select 127 127      0     -  336
root    593  0.0  0.0  5780  1444  ??  Is    9:35AM   0:00.05 /usr/ 
sbin/syslog     0     1   0  44  0 select  21 127      0     -   36
root    811  0.0  0.0 24872  4172  ??  Is    9:35AM   0:00.00 /usr/ 
sbin/sshd       0     1   0  44  0 select 127 127      0     -  220
root    837  0.0  0.0  6836  1532  ??  Is    9:35AM   0:00.07 /usr/ 
sbin/cron -     0     1   0  44  0 nanslp  27 127      0     -   36
root    974  0.0  0.0  8232  2484  ??  Ss    9:38AM   1:19.01  
screen               0     1   0  44  0 select   0 127      6     -  292
root   1612  0.0  0.0 38852  7856  ??  Ss    2:49PM   0:00.16 sshd:  
root at pts/0     0   811   0  44  0 select   0 127      0     -  220
root   1617  0.0  0.0 10188  2792   0  Is    2:49PM   0:00.01 -csh  
(csh)           0  1612   0  47  0 pause  127 127      0     -  304
root   1622  0.0  0.0  8232  2132   0  S+    2:49PM   0:00.03 screen - 
x            0  1617   0  44  0 pause    1 127      0     -  292
root    975  0.0  0.0 10188  2704   1  Is    9:38AM   0:00.01 /bin/ 
csh             0   974   0  49  0 pause  127 127      0     -  304
root    980  0.0  1.1 794196 766248   1  D+    9:38AM  66:50.20 rsync - 
ravH root     0   975   0  69  0 zio->i 127 127      1     -  344
root    982  0.0  0.2 181844 142444   1  I+    9:38AM  69:32.28 rsync - 
ravH root     0   980   0  44  0 select  57 127      0     -  344
root    983  0.0  0.0 10188  2788   2  Ss    9:38AM   0:00.01 /bin/ 
csh             0   974   0  44  0 pause    0 127      0     -  304
root      1  0.0  0.0  2176   596  ??  ILs   9:34AM   0:00.01 /sbin/ 
init --        0     0   0  44  0 wait   127 127      8     -  604
root    827  0.0  0.0 10796  3800  ??  Ss    9:35AM   0:00.50  
sendmail: accept     0     1   0  44  0 select   3 127      1     -  628
smmsp   831  0.0  0.0 10796  3844  ??  Is    9:35AM   0:00.01  
sendmail: Queue     25     1   0  44  0 pause  127 127      0     -  628
root    887  0.0  0.0  5776  1224  v0  Is+   9:35AM   0:00.01 /usr/ 
libexec/get     0     1   0  76  0 ttyin  127 127      0     -   20
root    888  0.0  0.0  5776  1224  v1  Is+   9:35AM   0:00.00 /usr/ 
libexec/get     0     1   0  76  0 ttyin  127 127      0     -   20
root    889  0.0  0.0  5776  1224  v2  Is+   9:35AM   0:00.01 /usr/ 
libexec/get     0     1   0  76  0 ttyin  127 127      0     -   20
root    890  0.0  0.0  5776  1224  v3  Is+   9:35AM   0:00.00 /usr/ 
libexec/get     0     1   0  76  0 ttyin  127 127      2     -   20
root    891  0.0  0.0  5776  1224  v4  Is+   9:35AM   0:00.01 /usr/ 
libexec/get     0     1   0  76  0 ttyin  127 127      0     -   20
root    892  0.0  0.0  5776  1224  v5  Is+   9:35AM   0:00.01 /usr/ 
libexec/get     0     1   0  76  0 ttyin  127 127      0     -   20
root    893  0.0  0.0  5776  1224  v6  Is+   9:35AM   0:00.01 /usr/ 
libexec/get     0     1   0  76  0 ttyin  127 127      0     -   20
root    894  0.0  0.0  5776  1224  v7  Is+   9:35AM   0:00.01 /usr/ 
libexec/get     0     1   0  76  0 ttyin  127 127      0     -   20
root    981  0.0  0.0 27668 11220   1  S+    9:38AM 172:25.63 ssh -l  
root 216.     0   980   0  44  0 select   1 127      0     -  120
root   2112  0.0  0.0  6892  1416   2  R+    6:48PM   0:00.00 ps  
auxwwlv           0   983   0  44  0 -      127   0      0     -   28
root      0  0.0  0.0     0   128  ??  DLs   9:34AM  46:56.19  
[kernel]             0     0   0 -68  0 -      127 127      0     -    0
root      2  0.0  0.0     0    16  ??  DL    9:34AM   0:01.23  
[g_event]            0     0   0  -8  0 -        0 127      0     -    0
root      3  0.0  0.0     0    16  ??  DL    9:34AM   0:26.74  
[g_up]               0     0   0  -8  0 -        0 127      0     -    0
root      4  0.0  0.0     0    16  ??  DL    9:34AM   5:51.49  
[g_down]             0     0   0  -8  0 -        0 127      0     -    0
root      5  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[xpt_thrd]           0     0   0 -16  0 ccb_sc 127 127      0     -    0
root      6  0.0  0.0     0    16  ??  DL    9:34AM   0:00.14  
[fdc0]               0     0   0 -16  0 -        0 127      0     -    0
root      7  0.0  0.0     0    24  ??  DL    9:34AM   0:00.00  
[sctp_iterator]      0     0   0 -16  0 waitin 127 127      0     -    0
root      8  0.0  0.0     0    16  ??  DL    9:34AM   0:00.03  
[pagedaemon]         0     0   0 -16  0 psleep   4 127      0     -    0
root      9  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[vmdaemon]           0     0   0 -16  0 psleep 127 127      0     -    0
root     10  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[audit]              0     0   0 -16  0 audit_ 127 127      0     -    0
root     11 800.0  0.0     0   128  ??  RL    9:34AM 3958:43.82  
[idle]               0     0   0 171  0 -      127 127      0     -    0
root     12  0.0  0.0     0   400  ??  WL    9:34AM   6:03.01  
[intr]               0     0   0 -60  0 -      127 127      0     -    0
root     13  0.0  0.0     0    16  ??  DL    9:34AM   0:44.77  
[yarrow]             0     0   0  44  0 -        0 127      0     -    0
root     14  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[usbus0]             0     0   0 -64  0 wmsg   127 127      0     -    0
root     15  0.0  0.0     0    16  ??  DL    9:34AM   0:00.92  
[usbus0]             0     0   0 -68  0 wmsg     2 127      0     -    0
root     16  0.0  0.0     0    16  ??  DL    9:34AM   0:00.56  
[usbus0]             0     0   0 -68  0 wmsg     2 127      0     -    0
root     17  0.0  0.0     0    16  ??  DL    9:34AM   0:00.55  
[usbus0]             0     0   0 -64  0 wmsg     2 127      0     -    0
root     18  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[usbus1]             0     0   0 -64  0 wmsg   127 127      0     -    0
root     19  0.0  0.0     0    16  ??  DL    9:34AM   0:00.92  
[usbus1]             0     0   0 -68  0 wmsg     2 127      0     -    0
root     20  0.0  0.0     0    16  ??  DL    9:34AM   0:00.52  
[usbus1]             0     0   0 -68  0 wmsg     2 127      0     -    0
root     21  0.0  0.0     0    16  ??  DL    9:34AM   0:00.60  
[usbus1]             0     0   0 -64  0 wmsg     2 127      0     -    0
root     22  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[pagezero]           0     0   0  76  0 pgzero 127 127      0     -    0
root     23  0.0  0.0     0    16  ??  DL    9:34AM   0:00.14  
[bufdaemon]          0     0   0 -16  0 psleep   0 127      0     -    0
root     24  0.0  0.0     0    16  ??  DL    9:34AM   0:24.78  
[syncer]             0     0   0  44  0 zfsvfs 127 127      0     -    0
root     25  0.0  0.0     0    16  ??  DL    9:34AM   0:02.17  
[vnlru]              0     0   0  44  0 vlruwt   0 127      0     -    0
root     26  0.0  0.0     0    16  ??  DL    9:34AM   0:00.21  
[softdepflush]       0     0   0 -16  0 sdflus   0 127      0     -    0
root     88  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[system_taskq]       0     0   0 -16  0 tq->tq 127 127      0     -    0
root     89  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[system_taskq]       0     0   0 -16  0 tq->tq 127 127      0     -    0
root     90  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[system_taskq]       0     0   0 -16  0 tq->tq 127 127      0     -    0
root     91  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[system_taskq]       0     0   0 -16  0 tq->tq 127 127      0     -    0
root     92  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[system_taskq]       0     0   0 -16  0 tq->tq 127 127      0     -    0
root     93  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[system_taskq]       0     0   0 -16  0 tq->tq 127 127      0     -    0
root     94  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[system_taskq]       0     0   0 -16  0 tq->tq 127 127      0     -    0
root     95  0.0  0.0     0    16  ??  DL    9:34AM   0:00.00  
[system_taskq]       0     0   0 -16  0 tq->tq 127 127      0     -    0
root    104  0.0  0.0     0    16  ??  DL    9:34AM   5:11.71  
[arc_reclaim_thr     0     0   0  44  0 arc_re   0 127      0     -    0
root    105  0.0  0.0     0    16  ??  DL    9:34AM   0:00.15  
[l2arc_feed_thre     0     0   0 -16  0 l2arc_   0 127      0     -    0
root    936  0.0  0.0     0    16  ??  DL    9:37AM   0:00.55  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    937  0.0  0.0     0    16  ??  DL    9:37AM   0:00.17  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    938  0.0  0.0     0    16  ??  DL    9:37AM   0:00.00  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    939  0.0  0.0     0    16  ??  DL    9:37AM   0:08.89  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    940  0.0  0.0     0    16  ??  DL    9:37AM   0:31.71  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    941  0.0  0.0     0    16  ??  DL    9:37AM   0:08.91  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    942  0.0  0.0     0    16  ??  DL    9:37AM   0:08.76  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    943  0.0  0.0     0    16  ??  DL    9:37AM   0:08.79  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    944  0.0  0.0     0    16  ??  DL    9:37AM   0:09.21  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    945  0.0  0.0     0    16  ??  DL    9:37AM   0:08.83  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    946  0.0  0.0     0    16  ??  DL    9:37AM   0:08.88  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    947  0.0  0.0     0    16  ??  DL    9:37AM   1:34.28  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    948  0.0  0.0     0    16  ??  DL    9:37AM   1:34.28  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    949  0.0  0.0     0    16  ??  DL    9:37AM   1:34.25  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    950  0.0  0.0     0    16  ??  DL    9:37AM   1:34.30  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    951  0.0  0.0     0    16  ??  DL    9:37AM   1:34.74  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    952  0.0  0.0     0    16  ??  DL    9:37AM   1:34.38  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    953  0.0  0.0     0    16  ??  DL    9:37AM   1:34.36  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    954  0.0  0.0     0    16  ??  DL    9:37AM   1:34.14  
[spa_zio]            0     0   0  44  0 tq->tq 127 127      0     -    0
root    955  0.0  0.0     0    16  ??  DL    9:37AM   2:46.35  
[spa_zio]            0     0   0  44  0 zfsvfs 127 127      0     -    0
root    956  0.0  0.0     0    16  ??  DL    9:37AM   0:00.00  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    957  0.0  0.0     0    16  ??  DL    9:37AM   0:00.00  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    958  0.0  0.0     0    16  ??  DL    9:37AM   0:00.00  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    959  0.0  0.0     0    16  ??  DL    9:37AM   0:00.00  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    960  0.0  0.0     0    16  ??  DL    9:37AM   0:00.00  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    961  0.0  0.0     0    16  ??  DL    9:37AM   0:00.11  
[spa_zio]            0     0   0 -16  0 tq->tq 127 127      0     -    0
root    962  0.0  0.0     0    16  ??  DL    9:37AM   0:24.24  
[vdev:worker da1     0     0   0  44  0 vgeom: 127 127      0     -    0
root    963  0.0  0.0     0    16  ??  DL    9:37AM   0:00.21  
[txg_thread_ente     0     0   0 -16  0 tx->tx 127 127      0     -    0
root    964  0.0  0.0     0    28  ??  DL    9:37AM   1:19.07  
[txg_thread_ente     0     0   0  44  0 tx->tx 127 127      0     -    0
root    965  0.0  0.0     0    16  ??  DL    9:37AM   0:12.34  
[zil_clean]          0     0   0 -16  0 tq->tq 127 127      0     -    0


Note that syncer and one spa_zio are stuck in zfsvfs, and my rsync  
process is frozen in zio->io_cv.

zpool makes everything look okay:

cs03# zpool iostat -v
                capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
z            549G  13.0T     14    133   709K  12.9M
   da1        549G  13.0T     14    133   709K  12.9M
----------  -----  -----  -----  -----  -----  -----

cs03# zpool status -v
   pool: z
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         z           ONLINE       0     0     0
           da1       ONLINE       0     0     0

errors: No known data errors


ZFS related sysctls:

vfs.zfs.arc_meta_limit: 26214400
vfs.zfs.arc_meta_used: 70401408
vfs.zfs.mdcomp_disable: 0
vfs.zfs.arc_min: 140928537
vfs.zfs.arc_max: 104857600
vfs.zfs.zfetch.array_rd_sz: 1048576
vfs.zfs.zfetch.block_cap: 256
vfs.zfs.zfetch.min_sec_reap: 2
vfs.zfs.zfetch.max_streams: 8
vfs.zfs.prefetch_disable: 0
vfs.zfs.recover: 0
vfs.zfs.txg.synctime: 5
vfs.zfs.txg.timeout: 30
vfs.zfs.scrub_limit: 10
vfs.zfs.vdev.cache.bshift: 16
vfs.zfs.vdev.cache.size: 10485760
vfs.zfs.vdev.cache.max: 16384
vfs.zfs.vdev.aggregation_limit: 131072
vfs.zfs.vdev.ramp_rate: 2
vfs.zfs.vdev.time_shift: 6
vfs.zfs.vdev.min_pending: 4
vfs.zfs.vdev.max_pending: 35
vfs.zfs.cache_flush_disable: 0
vfs.zfs.zil_disable: 0
vfs.zfs.version.zpl: 3
vfs.zfs.version.vdev_boot: 1
vfs.zfs.version.spa: 13
vfs.zfs.version.dmu_backup_stream: 1
vfs.zfs.version.dmu_backup_header: 2
vfs.zfs.version.acl: 1
vfs.zfs.debug: 0
vfs.zfs.super_owner: 0
kstat.zfs.misc.arcstats.hits: 57819155
kstat.zfs.misc.arcstats.misses: 5590858
kstat.zfs.misc.arcstats.demand_data_hits: 63981
kstat.zfs.misc.arcstats.demand_data_misses: 635
kstat.zfs.misc.arcstats.demand_metadata_hits: 47525277
kstat.zfs.misc.arcstats.demand_metadata_misses: 4114419
kstat.zfs.misc.arcstats.prefetch_data_hits: 19665
kstat.zfs.misc.arcstats.prefetch_data_misses: 809
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 10210232
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 1474995
kstat.zfs.misc.arcstats.mru_hits: 11018143
kstat.zfs.misc.arcstats.mru_ghost_hits: 937056
kstat.zfs.misc.arcstats.mfu_hits: 37133185
kstat.zfs.misc.arcstats.mfu_ghost_hits: 3428689
kstat.zfs.misc.arcstats.deleted: 7789751
kstat.zfs.misc.arcstats.recycle_miss: 6302888
kstat.zfs.misc.arcstats.mutex_miss: 9406
kstat.zfs.misc.arcstats.evict_skip: 418277598
kstat.zfs.misc.arcstats.hash_elements: 9872
kstat.zfs.misc.arcstats.hash_elements_max: 124517
kstat.zfs.misc.arcstats.hash_collisions: 112246
kstat.zfs.misc.arcstats.hash_chains: 96
kstat.zfs.misc.arcstats.hash_chain_max: 3
kstat.zfs.misc.arcstats.p: 92080153
kstat.zfs.misc.arcstats.c: 140928537
kstat.zfs.misc.arcstats.c_min: 140928537
kstat.zfs.misc.arcstats.c_max: 104857600
kstat.zfs.misc.arcstats.size: 141645696
kstat.zfs.misc.arcstats.hdr_size: 2617264
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 0
kstat.zfs.misc.vdev_cache_stats.delegations: 671323
kstat.zfs.misc.vdev_cache_stats.hits: 4416731
kstat.zfs.misc.vdev_cache_stats.misses: 349266


There does seem to be something stuck in the syncer:

vfs.worklist_len: 11

(that doesn't go down or move at all), but that doesn't tell me much.  
Next time this happens, is there anything else I should look at?

-- Kevin


On Feb 8, 2009, at 10:59 PM, Kevin Day wrote:

>
> I'm playing with a -CURRENT install from a couple of weeks ago.  
> Everything seems okay for a few days, then eventually every process  
> ends up stuck in zio->io_cv. If I go to the console, it's responsive  
> until I try logging in, then login is stuck in zio->io_cv as well.  
> Ctrl-Alt-Esc drops me into ddb, but then ddb hangs instantly.
>
> Nothing on the console or syslog before it hangs.
>
> Anyone seen anything similar?
>
> -- Kevin
>
>
> Possibly relevant info:
>
> 8 core Opteron
> 64GB RAM
>
> da1 at twa0 bus 0 target 0 lun 1
> da1: <AMCC 9650SE-12M DISK 3.08> Fixed Direct Access SCSI-5 device
> da1: 100.000MB/s transfers
> da1: 4678158MB (9580867585 512 byte sectors: 255H 63S/T 596381C)
>
> server5# zpool list
> NAME   SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
> z     4.44T  1.19T  3.25T    26%  ONLINE  -
>
> server5# zpool status -v
>  pool: z
> state: ONLINE
> scrub: none requested
> config:
>
> 	NAME        STATE     READ WRITE CKSUM
> 	z           ONLINE       0     0     0
> 	  da1       ONLINE       0     0     0
>
> errors: No known data errors
>
> server5# cat /boot/loader.conf
> vm.kmem_size_max="2048M"
> vm.kmem_size="2048M"
> vfs.zfs.arc_max="100M"
> zfs_load="YES"
> vfs.root.mountfrom="zfs:z"
>
>
> (tried lowering arc_max, didn't seem to help)
>
>
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"



More information about the freebsd-fs mailing list