ZFS scrub CPU bound?

Daniel Kalchev daniel at digsys.bg
Mon Aug 20 12:07:54 UTC 2012



On 20.08.12 13:44, Ivan Voras wrote:
> On 09/08/2012 17:39, Daniel Kalchev wrote:
>
>> CPU:  0.4% user,  0.0% nice, 51.4% system,  0.8% interrupt, 47.3% idle
>> Mem: 2171M Active, 1541M Inact, 5407M Wired, 25M Cache, 416K Buf, 22G Free
>> Swap: 8192M Total, 8192M Free
>>
>>    PID USERNAME  THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>>     11 root       32 155 ki31     0K   512K CPU31  31 190.2H 1818.46% idle
>>      0 root      268  -8    0     0K  4288K -       0 141:25 1261.33% kernel
>>      4 root        4  -8    -     0K    80K CPU30  30   9:13 95.17% zfskern
>>     13 root        3  -8    -     0K    48K -       4   5:37 42.97% geom
>>     12 root       66 -84    -     0K  1056K WAIT    0   6:19 30.86% intr
>> [...]
>> It seems that zfskern will top to 100%. This is an 32 core system, and
>> as you see scrub, at 600MB/sec is able to eat 16 cores (from 2x 2.2 GHz
> If you hit "H", top will show threads separately, so you will see which
> threads exactly do the work (you might need to reduce your font so that
> all of them fit on the screen :) ).

I have large screen(s) :)

Here is the top of top:

last pid:  4779;  load averages: 11.98,  4.15,  
1.70                                                                                     
up 0+01:46:30  14:40:49
898 processes: 49 running, 785 sleeping, 64 waiting
CPU:  1.2% user,  0.0% nice, 52.1% system,  0.6% interrupt, 46.1% idle
Mem: 1942M Active, 1140M Inact, 3257M Wired, 4240K Cache, 400K Buf, 25G Free
Swap: 8192M Total, 8192M Free

   PID USERNAME PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
     4 root      -8    -     0K    80K CPU4    4   1:35 100.00% 
zfskern{txg_thread_enter}
    11 root     155 ki31     0K   512K RUN    13 104:57 57.08% 
idle{idle: cpu13}
    11 root     155 ki31     0K   512K RUN    14 104:54 55.76% 
idle{idle: cpu14}
    11 root     155 ki31     0K   512K CPU8    8 104:55 54.59% 
idle{idle: cpu8}
    11 root     155 ki31     0K   512K RUN     9 105:01 54.20% 
idle{idle: cpu9}
    11 root     155 ki31     0K   512K CPU30  30 104:46 53.66% 
idle{idle: cpu30}
    11 root     155 ki31     0K   512K RUN    19 104:52 53.27% 
idle{idle: cpu19}
    11 root     155 ki31     0K   512K CPU15  15 104:45 52.98% 
idle{idle: cpu15}
    11 root     155 ki31     0K   512K CPU17  17 104:52 52.69% 
idle{idle: cpu17}
    11 root     155 ki31     0K   512K RUN    20 104:51 52.69% 
idle{idle: cpu20}
    11 root     155 ki31     0K   512K CPU12  12 104:54 52.29% 
idle{idle: cpu12}
    11 root     155 ki31     0K   512K CPU25  25 104:48 52.29% 
idle{idle: cpu25}
    11 root     155 ki31     0K   512K CPU24  24 104:51 52.10% 
idle{idle: cpu24}
    11 root     155 ki31     0K   512K CPU26  26 104:46 50.98% 
idle{idle: cpu26}
    11 root     155 ki31     0K   512K CPU11  11 104:53 50.68% 
idle{idle: cpu11}
    11 root     155 ki31     0K   512K RUN    31 104:48 50.49% 
idle{idle: cpu31}
    11 root     155 ki31     0K   512K CPU10  10 104:57 50.10% 
idle{idle: cpu10}
    11 root     155 ki31     0K   512K CPU1    1 104:53 50.00% 
idle{idle: cpu1}
    11 root     155 ki31     0K   512K RUN    29 104:48 49.76% 
idle{idle: cpu29}
     0 root     -16    0     0K  4288K -      11   0:42 49.66% 
kernel{zio_read_intr_20}
    11 root     155 ki31     0K   512K CPU3    3 104:57 49.56% 
idle{idle: cpu3}
     0 root     -16    0     0K  4288K -       6   0:42 49.56% 
kernel{zio_read_intr_15}
     0 root     -16    0     0K  4288K -      21   0:42 49.27% 
kernel{zio_read_intr_3}
    11 root     155 ki31     0K   512K CPU28  28 104:24 49.17% 
idle{idle: cpu28}
     0 root     -16    0     0K  4288K -      24   0:42 49.17% 
kernel{zio_read_intr_29}
     0 root     -16    0     0K  4288K CPU17  17   0:42 49.07% 
kernel{zio_read_intr_1}
    11 root     155 ki31     0K   512K RUN    27 104:46 48.97% 
idle{idle: cpu27}
     0 root     -16    0     0K  4288K CPU12  12   0:42 48.97% 
kernel{zio_read_intr_2}
     0 root     -16    0     0K  4288K -      25   0:42 48.78% 
kernel{zio_read_intr_7}
     0 root     -16    0     0K  4288K CPU14  14   0:42 48.78% 
kernel{zio_read_intr_11}
     0 root     -16    0     0K  4288K CPU19  19   0:42 48.78% 
kernel{zio_read_intr_28}
     0 root     -16    0     0K  4288K -      12   0:42 48.78% 
kernel{zio_read_intr_6}
     0 root     -16    0     0K  4288K -      29   0:42 48.78% 
kernel{zio_read_intr_18}
    11 root     155 ki31     0K   512K RUN     0 105:10 48.68% 
idle{idle: cpu0}
    11 root     155 ki31     0K   512K RUN     6 104:51 48.58% 
idle{idle: cpu6}
     0 root     -16    0     0K  4288K -      16   0:42 48.58% 
kernel{zio_read_intr_19}
     0 root     -16    0     0K  4288K CPU21  10   0:42 48.58% 
kernel{zio_read_intr_0}
     0 root     -16    0     0K  4288K CPU20  20   0:42 48.58% 
kernel{zio_read_intr_8}
     0 root     -16    0     0K  4288K CPU13  13   0:41 48.49% 
kernel{zio_read_intr_12}
     0 root     -16    0     0K  4288K -       5   0:42 48.39% 
kernel{zio_read_intr_5}
     0 root     -16    0     0K  4288K -      22   0:42 48.39% 
kernel{zio_read_intr_17}
     0 root     -16    0     0K  4288K CPU31  31   0:42 48.29% 
kernel{zio_read_intr_16}
     0 root     -16    0     0K  4288K -      17   0:42 48.29% 
kernel{zio_read_intr_4}
     0 root     -16    0     0K  4288K CPU0    0   0:42 48.29% 
kernel{zio_read_intr_13}
    11 root     155 ki31     0K   512K CPU22  22 104:41 48.19% 
idle{idle: cpu22}
     0 root     -16    0     0K  4288K -      25   0:42 48.19% 
kernel{zio_read_intr_31}
     0 root     -16    0     0K  4288K -      26   0:42 48.19% 
kernel{zio_read_intr_25}
     0 root     -16    0     0K  4288K CPU15  29   0:42 48.19% 
kernel{zio_read_intr_22}
     0 root     -16    0     0K  4288K -      10   0:42 48.10% 
kernel{zio_read_intr_14}
     0 root     -16    0     0K  4288K -       8   0:42 48.10% 
kernel{zio_read_intr_21}
     0 root     -16    0     0K  4288K -      15   0:42 48.10% 
kernel{zio_read_intr_26}
     0 root     -16    0     0K  4288K -      16   0:42 48.00% 
kernel{zio_read_intr_9}
     0 root     -16    0     0K  4288K -      18   0:42 48.00% 
kernel{zio_read_intr_30}
     0 root     -16    0     0K  4288K CPU9    9   0:41 48.00% 
kernel{zio_read_intr_27}
    11 root     155 ki31     0K   512K RUN    21 104:48 47.66% 
idle{idle: cpu21}
     0 root     -16    0     0K  4288K -      29   0:42 47.66% 
kernel{zio_read_intr_23}
     0 root     -16    0     0K  4288K -      29   0:42 47.66% 
kernel{zio_read_intr_10}
     0 root     -16    0     0K  4288K CPU23  23   0:42 47.56% 
kernel{zio_read_intr_24}
    11 root     155 ki31     0K   512K CPU18  18 104:52 47.36% 
idle{idle: cpu18}
    11 root     155 ki31     0K   512K CPU5    5 104:53 47.27% 
idle{idle: cpu5}
    11 root     155 ki31     0K   512K CPU16  16 104:50 47.17% 
idle{idle: cpu16}
    11 root     155 ki31     0K   512K RUN    23 104:44 47.17% 
idle{idle: cpu23}
    11 root     155 ki31     0K   512K CPU7    7 104:56 45.46% 
idle{idle: cpu7}
    11 root     155 ki31     0K   512K RUN     4 104:50 45.36% 
idle{idle: cpu4}
    11 root     155 ki31     0K   512K CPU2    2 104:55 42.29% 
idle{idle: cpu2}
    13 root      -8    -     0K    48K -       7   0:42 18.65% geom{g_down}
    12 root     -68    -     0K  1056K CPU30  30   0:32 16.26% 
intr{swi2: cambio}
    13 root      -8    -     0K    48K CPU28  28   0:32 14.99% geom{g_up}
    12 root     -88    -     0K  1056K CPU12  12   0:21  9.28% 
intr{irq256: mps0}

Is it possible the large number of IOPS to be causing this bottleneck? 
Too many zfskern state changes or something like that?
gstat says

dT: 1.002s  w: 1.000s
  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
     0      0      0      0    0.0      0      0    0.0    0.0| cd0
     6   9030   9024 149343    0.5      3    128    0.1   90.4| da0
     5   8740   8734 148807    0.5      3    128    0.1   89.9| da1
     5   8935   8929 149328    0.5      3    128    0.1   90.3| da2
     6   8649   8643 148753    0.5      3    128    0.1   90.1| da3
     0      0      0      0    0.0      0      0    0.0    0.0| da0p1
     0      0      0      0    0.0      0      0    0.0    0.0| da0p2
     6   9030   9024 149343    0.5      3    128    0.2   91.0| da0p3
     0      0      0      0    0.0      0      0    0.0    0.0| da1p1
     0      0      0      0    0.0      0      0    0.0    0.0| da1p2
     7   8739   8733 148803    0.5      3    128    0.2   90.5| da1p3
     0      0      0      0    0.0      0      0    0.0    0.0| da2p1
     0      0      0      0    0.0      0      0    0.0    0.0| da2p2
     7   8935   8929 149328    0.5      3    128    0.2   90.8| da2p3
     0      0      0      0    0.0      0      0    0.0    0.0| da3p1
     0      0      0      0    0.0      0      0    0.0    0.0| da3p2
     6   8649   8643 148753    0.5      3    128    0.2   90.7| da3p3
     0      0      0      0    0.0      0      0    0.0    0.0| 
gptid/072fa016-dbf3-11e1-9417-0025908065d8
     0      0      0      0    0.0      0      0    0.0    0.0| gpt/swap0
     0      0      0      0    0.0      0      0    0.0    0.0| md0
     0      0      0      0    0.0      0      0    0.0    0.0| 
gptid/0739a7b0-dbf3-11e1-9417-0025908065d8
     0      0      0      0    0.0      0      0    0.0    0.0| gpt/swap1
     0      0      0      0    0.0      0      0    0.0    0.0| 
gptid/07446e04-dbf3-11e1-9417-0025908065d8
     0      0      0      0    0.0      0      0    0.0    0.0| gpt/swap2
     0      0      0      0    0.0      0      0    0.0    0.0| 
gptid/074ee993-dbf3-11e1-9417-0025908065d8
     0      0      0      0    0.0      0      0    0.0    0.0| gpt/swap3
     0      0      0      0    0.0      0      0    0.0    0.0| 
mirror/swap0-1
     0      0      0      0    0.0      0      0    0.0    0.0| 
mirror/swap2-3


>> Opteron 6274). There is high load on geom as well... but geom does go
>> over 100% CPU, so I suppose it scales.
> Actually, GEOM itself is a bottleneck with SSDs and not really scalable
> in its current version. If you hit H, you could see that e.g. the g_down
> thread is sometimes pinned to 100% CPU or something similar. If all GEOM
> threads in your case are individually below 100%, then you didn't hit
> the GEOM bottleneck yet.
>

Ok, but apparently GEOM is not the bottleneck in my case.

Daniel


More information about the freebsd-fs mailing list