Irregular disk IO and poor performance (possibly after reading a lot of data from pool)

Steven Hartland killing at multiplay.co.uk
Mon Dec 1 17:28:29 UTC 2014


What disks?

On 01/12/2014 13:21, Dmitriy Makarov wrote:
> We have big ZFS pool (16TiB) with 36 disks that are grouped into 18 mirror devices.
>
> This weekend we were maintaining data on the pool.
> Two days straight 16 processes were busy reading files (to calculate checksums and things like that)
>
> Starting from the monday morning, few hours after maintainance was terminated
> we started to observe abnormal ZFS behaviour that was also accompanied by
> very very poor pool performance (many processes were blocked in zio->i).
>
> But the most strange thing is how IO is distributed between mirror devices.
> Normally, our 'iostat -x 1' looks like
>
> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> md0        0.0   5.9     0.0     0.0    0   0.0   0
> da0       28.7 178.2   799.6  6748.3    1   3.8  58
> da1       23.8 180.2   617.9  6748.3    1   3.4  56
> da2       44.6 168.3   681.3  6733.9    1   5.2  72
> da3       38.6 164.4   650.6  6240.3    1   4.9  65
> da4       29.7 176.3   471.3  5935.3    0   4.1  58
> da5       27.7 180.2   546.1  6391.3    1   3.9  57
> da6       27.7 238.6   555.0  6714.6    0   3.7  68
> da7       28.7 239.6   656.0  6714.6    0   3.3  58
> da8       26.7 318.8   738.7  8304.4    0   2.5  54
> da9       27.7 315.9   725.3  7769.7    0   3.0  77
> da10      23.8 268.3   510.0  7663.7    0   2.6  56
> da11      32.7 276.3   905.5  7697.9    0   3.4  70
> da12      24.8 293.1   559.0  6222.0    2   2.3  53
> da13      27.7 285.2   279.7  6058.1    1   2.9  62
> da14      29.7 226.8   374.3  5733.3    0   3.2  57
> da15      32.7 220.8   532.2  5538.7    1   3.3  65
> da16      30.7 165.4   638.2  4537.6    1   3.8  51
> da17      39.6 173.3   819.9  4884.2    1   3.2  46
> da18      28.7 221.8   765.4  5659.1    1   2.6  42
> da19      30.7 214.9   464.4  5417.4    0   4.6  78
> da20      32.7 177.2   725.3  4732.7    1   4.0  63
> da21      29.7 177.2   448.6  4722.8    0   5.3  66
> da22      19.8 153.5   398.6  4168.3    0   2.5  35
> da23      16.8 151.5   291.1  4243.6    1   2.9  39
> da24      26.7 186.2   547.1  5018.4    1   4.4  68
> da25      30.7 190.1   709.0  5096.6    1   5.0  71
> da26      28.7 222.8   690.7  5251.1    0   3.0  55
> da27      21.8 213.9   572.3  5248.6    0   2.8  49
> da28      34.7 177.2  1096.2  5027.8    1   4.9  65
> da29      36.6 175.3  1172.9  5012.0    2   4.9  63
> da30      22.8 197.1   462.9  5906.6    0   2.8  51
> da31      25.7 204.0   445.6  6138.3    0   3.4  62
> da32      31.7 170.3   557.0  5600.6    1   4.6  58
> da33      33.7 161.4   698.1  5509.5    1   4.8  60
> da34      28.7 269.3   473.8  6661.6    1   5.2  77
> da35      27.7 268.3   424.3  6440.8    0   5.6  75
>
>
> kw/s is always distributed pretty much evenly.
> Now it looks mostly like this:
>
> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> md0        0.0  18.8     0.0     0.0    0   0.0   0
> da0       35.7   0.0  1070.9     0.0    0  13.3  37
> da1       38.7   0.0  1227.0     0.0    0  12.7  40
> da2       25.8   0.0   920.2     0.0    0  12.0  26
> da3       26.8   0.0   778.0     0.0    0  10.9  23
> da4       22.8   0.0   792.4     0.0    0  14.4  25
> da5       26.8   0.0  1050.5     0.0    0  13.4  27
> da6       32.7   0.0  1359.3     0.0    0  17.0  41
> da7       23.8 229.9   870.7 17318.1    0   3.0  55
> da8       58.5   0.0  1813.7     0.0    1  12.9  56
> da9       63.4   0.0  1615.0     0.0    0  12.4  61
> da10      48.6   0.0  1448.0     0.0    0  16.7  55
> da11      49.6   0.0  1148.2     0.0    1  16.7  60
> da12      47.6   0.0  1508.4     0.0    0  14.8  46
> da13      47.6   0.0  1417.7     0.0    0  17.9  55
> da14      44.6   0.0  1997.5     0.0    1  15.6  49
> da15      48.6   0.0  2061.4     0.0    1  14.2  47
> da16      44.6   0.0  1587.7     0.0    1  16.9  51
> da17      45.6   0.0  1326.1     0.0    2  15.7  55
> da18      50.5   0.0  1433.6     0.0    2  16.7  57
> da19      57.5   0.0  2415.8     0.0    3  20.4  70
> da20      52.5 222.0  2097.1 10613.0    5  12.8 100
> da21      52.5 256.7  1967.8 11498.5    3  10.6 100
> da22      37.7 433.1  1342.4 12880.1    4   5.5  99
> da23      42.6 359.8  2304.3 13073.8    5   7.2 101
> da24      33.7   0.0  1256.7     0.0    1  15.4  40
> da25      26.8   0.0   853.8     0.0    2  15.1  32
> da26      23.8   0.0   343.9     0.0    1  12.4  28
> da27      26.8   0.0   400.4     0.0    0  12.4  31
> da28      15.9   0.0   575.3     0.0    1  11.4  17
> da29      20.8   0.0   750.7     0.0    0  14.4  24
> da30      37.7   0.0   952.4     0.0    0  12.6  37
> da31      29.7   0.0   777.0     0.0    0  13.6  37
> da32      54.5 121.9  1824.6  6514.4    7  27.7 100
> da33      56.5 116.9  2017.3  6213.6    6  29.7  99
> da34      42.6   0.0  1303.3     0.0    1  14.9  43
> da35      45.6   0.0  1400.9     0.0    2  14.8  45
>
> Some deviced have 0.0 kw/s for long period of time,
> then others and so on and so on.
> Here some more results:
>
> device     r/s   w/s    kr/s    kw/s qlen svc_t  %b
> md0        0.0  37.9     0.0     0.0    0   0.0   0
> da0       58.9 173.7  1983.5  4585.3    3  11.2  87
> da1       49.9 162.7  1656.2  4548.4    3  14.0  95
> da2       40.9 187.6  1476.5  3466.6    1   4.8  58
> da3       42.9 188.6  1646.7  3466.6    0   5.3  64
> da4       54.9  33.9  2222.6  1778.4    1  13.3  63
> da5       53.9  37.9  2429.6  1778.4    2  12.9  68
> da6       42.9  33.9  1445.1   444.6    0  10.3  45
> da7       40.9  28.9  2045.9   444.6    0  12.3  43
> da8       53.9   0.0   959.6     0.0    1  22.7  62
> da9       29.9   0.0   665.2     0.0    1  52.1  64
> da10      52.9  83.8  1845.3  2084.8    2   8.2  64
> da11      44.9 103.8  1654.2  4895.2    1   8.8  71
> da12      50.9  60.9  1273.0  2078.3    1  10.3  69
> da13      39.9  57.9   940.1  2078.3    0  15.4  75
> da14      45.9  72.9   977.0  3178.6    0   8.5  63
> da15      48.9  72.9  1000.5  3178.6    0   9.6  72
> da16      42.9  74.9  1187.6  2118.8    1   6.7  51
> da17      48.9  82.8  1651.7  3013.0    0   5.7  52
> da18      67.9  78.8  2735.5  2456.1    0  11.5  75
> da19      52.9  79.8  2436.6  2456.1    0  13.1  82
> da20      48.9  91.8  2623.8  1682.6    1   7.2  60
> da21      52.9  92.8  1893.2  1682.6    0   7.1  61
> da22      67.9  20.0  2518.0   701.1    0  13.5  79
> da23      68.9  23.0  3331.8   701.1    1  13.6  77
> da24      45.9  17.0  2148.7   369.8    1  11.6  47
> da25      36.9  18.0  1747.5   369.8    1  12.6  46
> da26      46.9   1.0  1873.3     0.5    0  21.3  55
> da27      38.9   1.0  1395.7     0.5    0  34.6  58
> da28      34.9   9.0  1523.5    53.9    0  14.1  39
> da29      26.9  10.0  1124.8    53.9    1  13.8  28
> da30      44.9   0.0  1887.2     0.0    0  18.8  50
> da31      47.9   0.0  2273.0     0.0    0  20.2  49
> da32      65.9  90.8  2221.6  1730.5    3   9.7  77
> da33      79.8  90.8  3304.9  1730.5    1   9.9  88
> da34      75.8 134.7  3638.7  3938.1    2  10.2  90
> da35      49.9 209.6  1792.4  5756.0    2   8.1  85
>
>
> md0        0.0  19.0     0.0     0.0    0   0.0   0
> da0       38.0 194.8  1416.1  1175.8    1  10.6 100
> da1       40.0 190.8  1424.6  1072.9    2  10.4 100
> da2       37.0   0.0  1562.4     0.0    0  14.9  40
> da3       31.0   0.0  1169.8     0.0    0  14.0  33
> da4       44.0   0.0  2632.4     0.0    0  18.0  45
> da5       41.0   0.0  1944.6     0.0    0  19.0  45
> da6       38.0   0.0  1786.2     0.0    1  18.4  44
> da7       45.0   0.0  2275.7     0.0    0  16.0  48
> da8       80.9   0.0  4151.3     0.0    2  24.1  85
> da9       83.9   0.0  3256.2     0.0    3  21.2  83
> da10      61.9   0.0  3657.3     0.0    1  18.9  65
> da11      53.9   0.0  2532.5     0.0    1  18.7  56
> da12      54.9   0.0  2650.8     0.0    0  18.9  60
> da13      48.0   0.0  1975.5     0.0    0  19.6  53
> da14      43.0   0.0  1802.7     0.0    2  14.1  43
> da15      49.0   0.0  2455.5     0.0    0  14.0  48
> da16      45.0   0.0  1521.5     0.0    1  16.0  50
> da17      45.0   0.0  1650.8     0.0    4  13.7  47
> da18      48.0   0.0  1618.9     0.0    1  15.0  54
> da19      47.0   0.0  1982.0     0.0    0  16.5  55
> da20      52.9   0.0  2186.3     0.0    0  19.8  65
> da21      61.9   0.0  3020.5     0.0    0  16.3  61
> da22      70.9   0.0  3309.7     0.0    1  15.5  67
> da23      67.9   0.0  2742.3     0.0    2  16.5  73
> da24      38.0   0.0  1426.1     0.0    1  15.5  40
> da25      41.0   0.0  1905.6     0.0    1  14.0  39
> da26      43.0   0.0  2371.1     0.0    0  14.2  40
> da27      46.0   0.0  2178.3     0.0    0  15.2  45
> da28      44.0   0.0  2092.9     0.0    0  12.4  43
> da29      41.0   0.0  1442.1     0.0    1  13.4  37
> da30      42.0  37.0  1171.3   645.9    1  17.5  62
> da31      27.0  67.9   713.8   290.7    0  16.7  64
> da32      47.0   0.0  1043.5     0.0    0  13.3  43
> da33      50.0   0.0  1741.3     0.0    1  15.7  57
> da34      42.0   0.0  1119.9     0.0    0  18.2  55
> da35      45.0   0.0  1071.4     0.0    0  15.7  55
>
>
> First thing we did is tried to reboot.
> It took system more than 5 minutes to import the pool (normally it's a fraction of a second).
> Nedless to say reboot did not help a bit.
>
> What can we do about this problem?
>
>
> System info:
> FreeBSD 11.0-CURRENT #5 r260625
>
> zpool get all disk1
> NAME   PROPERTY                       VALUE                          SOURCE
> disk1  size                           16,3T                          -
> disk1  capacity                       59%                            -
> disk1  altroot                        -                              default
> disk1  health                         ONLINE                         -
> disk1  guid                           4909337477172007488            default
> disk1  version                        -                              default
> disk1  bootfs                         -                              default
> disk1  delegation                     on                             default
> disk1  autoreplace                    off                            default
> disk1  cachefile                      -                              default
> disk1  failmode                       wait                           default
> disk1  listsnapshots                  off                            default
> disk1  autoexpand                     off                            default
> disk1  dedupditto                     0                              default
> disk1  dedupratio                     1.00x                          -
> disk1  free                           6,56T                          -
> disk1  allocated                      9,76T                          -
> disk1  readonly                       off                            -
> disk1  comment                        -                              default
> disk1  expandsize                     0                              -
> disk1  freeing                        0                              default
> disk1  feature at async_destroy          enabled                        local
> disk1  feature at empty_bpobj            active                         local
> disk1  feature at lz4_compress           active                         local
> disk1  feature at multi_vdev_crash_dump  enabled                        local
> disk1  feature at spacemap_histogram     active                         local
> disk1  feature at enabled_txg            active                         local
> disk1  feature at hole_birth             active                         local
> disk1  feature at extensible_dataset     enabled                        local
> disk1  feature at bookmarks              enabled                        local
>
>
>
> zfs get all disk1
> NAME   PROPERTY              VALUE                                          SOURCE
> disk1  type                  filesystem                                     -
> disk1  creation              Wed Sep 18 11:47 2013                          -
> disk1  used                  9,75T                                          -
> disk1  available             6,30T                                          -
> disk1  referenced            9,74T                                          -
> disk1  compressratio         1.63x                                          -
> disk1  mounted               yes                                            -
> disk1  quota                 none                                           default
> disk1  reservation           none                                           default
> disk1  recordsize            128K                                           default
> disk1  mountpoint            /.........                                     local
> disk1  sharenfs              off                                            default
> disk1  checksum              on                                             default
> disk1  compression           lz4                                            local
> disk1  atime                 off                                            local
> disk1  devices               on                                             default
> disk1  exec                  off                                            local
> disk1  setuid                off                                            local
> disk1  readonly              off                                            default
> disk1  jailed                off                                            default
> disk1  snapdir               hidden                                         default
> disk1  aclmode               discard                                        default
> disk1  aclinherit            restricted                                     default
> disk1  canmount              on                                             default
> disk1  xattr                 off                                            temporary
> disk1  copies                1                                              default
> disk1  version               5                                              -
> disk1  utf8only              off                                            -
> disk1  normalization         none                                           -
> disk1  casesensitivity       sensitive                                      -
> disk1  vscan                 off                                            default
> disk1  nbmand                off                                            default
> disk1  sharesmb              off                                            default
> disk1  refquota              none                                           default
> disk1  refreservation        none                                           default
> disk1  primarycache          all                                            default
> disk1  secondarycache        none                                           local
> disk1  usedbysnapshots       0                                              -
> disk1  usedbydataset         9,74T                                          -
> disk1  usedbychildren        9,71G                                          -
> disk1  usedbyrefreservation  0                                              -
> disk1  logbias               latency                                        default
> disk1  dedup                 off                                            default
> disk1  mlslabel                                                             -
> disk1  sync                  standard                                       local
> disk1  refcompressratio      1.63x                                          -
> disk1  written               9,74T                                          -
> disk1  logicalused           15,8T                                          -
> disk1  logicalreferenced     15,8T                                          -
>
>
> This is very severe, thanks.
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"



More information about the freebsd-fs mailing list