Machine stops for some seconds with ZFS

Thomas Burgess wonslung at gmail.com
Wed Feb 3 10:38:47 UTC 2010


why would you use a usb drive for L2ARC?

I would think that would make things slower...have you tried setting up
without the usb drive?


On Wed, Feb 3, 2010 at 4:48 AM, Attila Nagy <bra at fsn.hu> wrote:

> Hello,
>
> After a long time, I've switched back to ZFS on my desktop. It runs
> 8-STABLE/amd64 with two SATA disks and an USB pendrive.
> One-one partition is used from each disk for the zpool, which is encrypted
> using GELI, and the pendrive is there for L2ARC:
>   NAME            STATE     READ WRITE CKSUM
>   data            ONLINE       0     0     0
>     mirror        ONLINE       0     0     0
>       ad0s1d.eli  ONLINE       0     0     0
>       ad1s1d.eli  ONLINE       0     0     0
>   cache
>     da0           ONLINE       0     0     0
>
> Today, after 12 days of uptime the machine has frozen. I could ping it from
> a different machine, even could open a telnet to its ssh port, but I
> couldn't get the ssh banner.
>
> Now I'm building a 9-CURRENT kernel and world to see whether the same
> problem persists with that, and during the make process I've noticed a
> strange thing.
> I build with -j4 (the machine has one dual core CPU), so the fans are
> screaming during the process. But every few minutes (I couldn't recognize
> any patterns in it) the machine goes completely silent (even more silent
> than normally), and everything halts.
> During this, the top running on the machine can refresh itself, and I can
> type on pass through ssh connections (that is, I use the machine in question
> to access other machines with ssh), but I can't open new ssh connections to
> it, and can't start anything new (for example from an open shell).
> ping is running seamlessly during this, and top shows the following:
>
> last pid: 36503;  load averages:  1.59,  3.04,  3.01    up 0+00:49:53
>  10:32:10
> 97 processes:  1 running, 96 sleeping
> CPU:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
> Mem: 218M Active, 24M Inact, 639M Wired, 40M Cache, 6208K Buf, 1022M Free
> Swap: 4096M Total, 4096M Free
>
>  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
> 1342 root          1  44    0  3204K   620K select  0   0:02  0.00% make
> 1424 root          1  44    0  3204K  1036K select  0   0:01  0.00% make
> 1280 root          1  44    0 12540K  1900K select  0   0:01  0.00%
> hald-addon-storage
> 1234 haldaemon     1  44    0 24116K  4464K select  0   0:01  0.00% hald
> 93600 root          1  44    0  3204K  1028K select  0   0:00  0.00% make
> 1260 root          1  44    0 19704K  2688K select  0   0:00  0.00%
> hald-addon-mouse-sy
> 15142 bra           1  44    0  9332K  2864K CPU0    0   0:00  0.00% top
> 1263 root          1  44    0 12540K  1896K cgticb  0   0:00  0.00%
> hald-addon-storage
> 94415 bra           1  44    0 37944K  4992K select  1   0:00  0.00% sshd
> 35837 root          1  44    0  5252K  2424K select  1   0:00  0.00% make
> 95361 bra           1  44    0 37944K  4992K select  1   0:00  0.00% sshd
> 35973 root          1  44    0  3204K  1772K select  0   0:00  0.00% make
>  608 root          1  44    0  6892K  1436K select  1   0:00  0.00% syslogd
> 96928 root          1  44    0  3204K   728K select  0   0:00  0.00% make
> 94369 root          1  51    0 37944K  4584K sbwait  0   0:00  0.00% sshd
> 82631 root          1  50    0 37944K  4584K sbwait  0   0:00  0.00% sshd
> 16304 root          1  44    0 37944K  4576K zio->i  1   0:00  0.00% sshd
>  951 _ntp          1  44    0  6876K  1692K select  0   0:00  0.00% ntpd
> 1238 root          1  76    0 16768K  2372K select  0   0:00  0.00%
> hald-runner
> 4916 root          1  44    0  3204K   728K select  1   0:00  0.00% make
> 95338 root          1  49    0 37944K  4584K sbwait  1   0:00  0.00% sshd
> 1259 root          1  44    0 10280K  2712K pause   1   0:00  0.00% csh
> 33357 bra           1  44    0 21596K  4004K select  0   0:00  0.00% ssh
> 16405 bra           1  44    0 37944K  5012K zio->i  0   0:00  0.00% sshd
> 1044 root          1  44    0  9104K  1796K kqread  0   0:00  0.00% master
> 34765 root          1  76    0  8260K  1764K wait    1   0:00  0.00% sh
> 82685 bra           1  44    0 37944K  4960K select  1   0:00  0.00% sshd
> 1065 postfix       1  44    0  9100K  1872K kqread  0   0:00  0.00% qmgr
> 1237 root         17  44    0 27460K  4124K waitvt  0   0:00  0.00%
> console-kit-daemon
> 95362 bra           1  44    0 10216K  2612K ttyin   0   0:00  0.00% bash
> 34764 root          1  44    0  3204K   852K select  0   0:00  0.00% make
> 1222 root          1  49    0 21672K  1896K wait    0   0:00  0.00% login
> 35728 root          1  44    0  3204K   860K select  0   0:00  0.00% make
> 1064 postfix       1  44    0  9104K  1772K zio->i  1   0:00  0.00% pickup
> 82696 bra           1  44    0 10216K  2596K wait    0   0:00  0.00% bash
> 94417 bra           1  44    0 10216K  2596K wait    1   0:00  0.00% bash
> 35455 root          1  44    0  3204K   744K select  0   0:00  0.00% make
> 35774 root          1  44    0  3204K   728K select  1   0:00  0.00% make
> 16409 bra           1  44    0 10216K  2592K ttyin   0   0:00  0.00% bash
> 1155 root          1  44    0  7948K  1604K nanslp  0   0:00  0.00% cron
> 1077 messagebus    1  53    0  8092K  2060K select  0   0:00  0.00%
> dbus-daemon
> 1149 root          1  44    0 26012K  3960K select  1   0:00  0.00% sshd
> 35729 root          1  76    0  8260K  1760K wait    0   0:00  0.00% sh
> 4921 root          1  57    0  8260K  1748K wait    0   0:00  0.00% sh
>  825 root          1  76    0 39212K  2372K lockf   1   0:00  0.00%
> saslauthd
> 35460 root          1  76    0  8260K  1748K wait    0   0:00  0.00% sh
> 34761 root          1  48    0  8260K  1740K wait    1   0:00  0.00% sh
> 96923 root          1  50    0  8260K  1740K wait    0   0:00  0.00% sh
>
>
> As you can see, top reports that the machine is 100% idle, while a make -j4
> buildworld runs. This lasts for few seconds (10-20), then everything goes
> back to normal, the fans start to scream, the build continues and I can use
> the machine.
> This occasional halt is new to me -but I'm just switched to ZFS on my
> desktop, in a server it's harder to notice if you don't use it for
> interactive sessions-, but I could see the final freeze on more than one
> servers.
> How could I help to debug this, and the final one?
>
> Thanks,
> _______________________________________________
> freebsd-fs at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"
>


More information about the freebsd-fs mailing list