Machine stops for some seconds with ZFS

Attila Nagy bra at fsn.hu
Wed Feb 3 11:26:49 UTC 2010


Slower in what regard? In sequential read -which is meaningless-, maybe. 
But in random read and latency? Absolutely no.

Compare these:
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    1     64     64   4071    8.7      0      0    0.0   55.4| ad0s1d.eli
    0     44     44   2799    7.1      0      0    0.0   31.0| ad1s1d.eli
    1   1208   1208   1908    0.8      0      0    0.0   78.8| da0

An average consumer SATA drive can push about 120 IOPS with 8-10 ms seek 
time. An average consumer USB pendrive can perform more than 10 times 
better of that (in both IOPS and latency).

SATA drive: (64/55.4)*100=115 IOPS, latency: about 8 ms
USB drive: (1208/78.8)*100=1530 IOPS, latency: about 0.8 ms

This is the essential of Windows ReadyBoost and ZFS's L2ARC.

There is absolutely no IO (no nothing) towards the disks (be is HDD or 
SSD), so this is not because of the cache. (yes, I've tried without 
that, and the freeze also comes without an L2ARC device)

Matthias Gamsjager wrote:
> What's the point in having a cache device that is slower then the
> harddisks itself?
> could you please try the build without the slow cache device?
>
> On Wed, Feb 3, 2010 at 10:48 AM, Attila Nagy <bra at fsn.hu> wrote:
>   
>> Hello,
>>
>> After a long time, I've switched back to ZFS on my desktop. It runs
>> 8-STABLE/amd64 with two SATA disks and an USB pendrive.
>> One-one partition is used from each disk for the zpool, which is encrypted
>> using GELI, and the pendrive is there for L2ARC:
>>   NAME            STATE     READ WRITE CKSUM
>>   data            ONLINE       0     0     0
>>     mirror        ONLINE       0     0     0
>>       ad0s1d.eli  ONLINE       0     0     0
>>       ad1s1d.eli  ONLINE       0     0     0
>>   cache
>>     da0           ONLINE       0     0     0
>>
>> Today, after 12 days of uptime the machine has frozen. I could ping it from
>> a different machine, even could open a telnet to its ssh port, but I
>> couldn't get the ssh banner.
>>
>> Now I'm building a 9-CURRENT kernel and world to see whether the same
>> problem persists with that, and during the make process I've noticed a
>> strange thing.
>> I build with -j4 (the machine has one dual core CPU), so the fans are
>> screaming during the process. But every few minutes (I couldn't recognize
>> any patterns in it) the machine goes completely silent (even more silent
>> than normally), and everything halts.
>> During this, the top running on the machine can refresh itself, and I can
>> type on pass through ssh connections (that is, I use the machine in question
>> to access other machines with ssh), but I can't open new ssh connections to
>> it, and can't start anything new (for example from an open shell).
>> ping is running seamlessly during this, and top shows the following:
>>
>> last pid: 36503;  load averages:  1.59,  3.04,  3.01    up 0+00:49:53
>>  10:32:10
>> 97 processes:  1 running, 96 sleeping
>> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
>> Mem: 218M Active, 24M Inact, 639M Wired, 40M Cache, 6208K Buf, 1022M Free
>> Swap: 4096M Total, 4096M Free
>>
>>  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>> 1342 root          1  44    0  3204K   620K select  0   0:02  0.00% make
>> 1424 root          1  44    0  3204K  1036K select  0   0:01  0.00% make
>> 1280 root          1  44    0 12540K  1900K select  0   0:01  0.00%
>> hald-addon-storage
>> 1234 haldaemon     1  44    0 24116K  4464K select  0   0:01  0.00% hald
>> 93600 root          1  44    0  3204K  1028K select  0   0:00  0.00% make
>> 1260 root          1  44    0 19704K  2688K select  0   0:00  0.00%
>> hald-addon-mouse-sy
>> 15142 bra           1  44    0  9332K  2864K CPU0    0   0:00  0.00% top
>> 1263 root          1  44    0 12540K  1896K cgticb  0   0:00  0.00%
>> hald-addon-storage
>> 94415 bra           1  44    0 37944K  4992K select  1   0:00  0.00% sshd
>> 35837 root          1  44    0  5252K  2424K select  1   0:00  0.00% make
>> 95361 bra           1  44    0 37944K  4992K select  1   0:00  0.00% sshd
>> 35973 root          1  44    0  3204K  1772K select  0   0:00  0.00% make
>>  608 root          1  44    0  6892K  1436K select  1   0:00  0.00% syslogd
>> 96928 root          1  44    0  3204K   728K select  0   0:00  0.00% make
>> 94369 root          1  51    0 37944K  4584K sbwait  0   0:00  0.00% sshd
>> 82631 root          1  50    0 37944K  4584K sbwait  0   0:00  0.00% sshd
>> 16304 root          1  44    0 37944K  4576K zio->i  1   0:00  0.00% sshd
>>  951 _ntp          1  44    0  6876K  1692K select  0   0:00  0.00% ntpd
>> 1238 root          1  76    0 16768K  2372K select  0   0:00  0.00%
>> hald-runner
>> 4916 root          1  44    0  3204K   728K select  1   0:00  0.00% make
>> 95338 root          1  49    0 37944K  4584K sbwait  1   0:00  0.00% sshd
>> 1259 root          1  44    0 10280K  2712K pause   1   0:00  0.00% csh
>> 33357 bra           1  44    0 21596K  4004K select  0   0:00  0.00% ssh
>> 16405 bra           1  44    0 37944K  5012K zio->i  0   0:00  0.00% sshd
>> 1044 root          1  44    0  9104K  1796K kqread  0   0:00  0.00% master
>> 34765 root          1  76    0  8260K  1764K wait    1   0:00  0.00% sh
>> 82685 bra           1  44    0 37944K  4960K select  1   0:00  0.00% sshd
>> 1065 postfix       1  44    0  9100K  1872K kqread  0   0:00  0.00% qmgr
>> 1237 root         17  44    0 27460K  4124K waitvt  0   0:00  0.00%
>> console-kit-daemon
>> 95362 bra           1  44    0 10216K  2612K ttyin   0   0:00  0.00% bash
>> 34764 root          1  44    0  3204K   852K select  0   0:00  0.00% make
>> 1222 root          1  49    0 21672K  1896K wait    0   0:00  0.00% login
>> 35728 root          1  44    0  3204K   860K select  0   0:00  0.00% make
>> 1064 postfix       1  44    0  9104K  1772K zio->i  1   0:00  0.00% pickup
>> 82696 bra           1  44    0 10216K  2596K wait    0   0:00  0.00% bash
>> 94417 bra           1  44    0 10216K  2596K wait    1   0:00  0.00% bash
>> 35455 root          1  44    0  3204K   744K select  0   0:00  0.00% make
>> 35774 root          1  44    0  3204K   728K select  1   0:00  0.00% make
>> 16409 bra           1  44    0 10216K  2592K ttyin   0   0:00  0.00% bash
>> 1155 root          1  44    0  7948K  1604K nanslp  0   0:00  0.00% cron
>> 1077 messagebus    1  53    0  8092K  2060K select  0   0:00  0.00%
>> dbus-daemon
>> 1149 root          1  44    0 26012K  3960K select  1   0:00  0.00% sshd
>> 35729 root          1  76    0  8260K  1760K wait    0   0:00  0.00% sh
>> 4921 root          1  57    0  8260K  1748K wait    0   0:00  0.00% sh
>>  825 root          1  76    0 39212K  2372K lockf   1   0:00  0.00%
>> saslauthd
>> 35460 root          1  76    0  8260K  1748K wait    0   0:00  0.00% sh
>> 34761 root          1  48    0  8260K  1740K wait    1   0:00  0.00% sh
>> 96923 root          1  50    0  8260K  1740K wait    0   0:00  0.00% sh
>>
>>
>> As you can see, top reports that the machine is 100% idle, while a make -j4
>> buildworld runs. This lasts for few seconds (10-20), then everything goes
>> back to normal, the fans start to scream, the build continues and I can use
>> the machine.
>> This occasional halt is new to me -but I'm just switched to ZFS on my
>> desktop, in a server it's harder to notice if you don't use it for
>> interactive sessions-, but I could see the final freeze on more than one
>> servers.
>> How could I help to debug this, and the final one?
>>
>> Thanks,
>> _______________________________________________
>> freebsd-fs at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>>
>>     



More information about the freebsd-fs mailing list