Machine stops for some seconds with ZFS

Attila Nagy bra at fsn.hu
Wed Feb 3 12:31:54 UTC 2010


Tommi Lätti wrote:
>> After a long time, I've switched back to ZFS on my desktop. It runs
>> 8-STABLE/amd64 with two SATA disks and an USB pendrive.
>> One-one partition is used from each disk for the zpool, which is encrypted
>> using GELI, and the pendrive is there for L2ARC:
>>   NAME            STATE     READ WRITE CKSUM
>>   data            ONLINE       0     0     0
>>     mirror        ONLINE       0     0     0
>>       ad0s1d.eli  ONLINE       0     0     0
>>       ad1s1d.eli  ONLINE       0     0     0
>>   cache
>>     da0           ONLINE       0     0     0
>>
>> Today, after 12 days of uptime the machine has frozen. I could ping it from
>> a different machine, even could open a telnet to its ssh port, but I
>> couldn't get the ssh banner.
>>
>> Now I'm building a 9-CURRENT kernel and world to see whether the same
>> problem persists with that, and during the make process I've noticed a
>> strange thing.
>> I build with -j4 (the machine has one dual core CPU), so the fans are
>> screaming during the process. But every few minutes (I couldn't recognize
>> any patterns in it) the machine goes completely silent (even more silent
>> than normally), and everything halts.
>>  PID USERNAME    THR PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
>> 16304 root          1  44    0 37944K  4576K zio->i  1   0:00  0.00% sshd
>> 16405 bra           1  44    0 37944K  5012K zio->i  0   0:00  0.00% sshd
>> 1064 postfix       1  44    0  9104K  1772K zio->i  1   0:00  0.00% pickup
>>     
>
> This sounds like you're being hit by the same performance slowdown
> (extensively documented) that seems to affect everybody currently
> (maybe not those guys with ssd's or 15k rpm drives in big arrays).
> There's a long thread on -STABLE.
>
> Basically how I see it it's impossible to read and write from zfs pool
> at the same time which might be caused how the arc cache behaves under
> freebsd (didn't have these problems when zfs was still 'unstable'). I
> couldn't even watch a 720p video while having small writes (less than
> 1k every few seconds) to the same array without the smb process going
> to zio->i state which seems to indicate a complete block on any i/o.
>   
I'm not sure about these are being the same. When I see these "stops", 
nothing happens. gstat shows no IO, and everything goes very quiet for 
some painful (10s of) seconds. Also, buildworld doesn't do too heavy IO, 
at least for these drives.
BTW, we are doing tests in a completely different environment with ZFS 
and NFS (15k drives, with BBWC), and it seems something similar happens 
there too, resulting in a freeze in the end.
> Combine with 5400 rpm consumer drives... well...
>   
Doesn't really count, but these drives are 7k2.
> -> switched to opensolaris, performance is now great...
>   
I would like to avoid that, because we don't have the same, well managed 
netbooted environment for that OS what we have for FreeBSD.


More information about the freebsd-fs mailing list