ZFS deadlock with virtio

Runer run00er at gmail.com
Mon Mar 4 10:39:24 UTC 2019


Most likely you are right!

I noticed the same bhyve behavior with zfs.

My searches led Me here to these links:

https://smartos.org/bugview/OS-7300

https://smartos.org/bugview/OS-7314

https://smartos.org/bugview/OS-6912

Most likely Illumos should roll out the patches.

But when these changes fall into the FreeBsd branch, I could not understand.

Good luck!

04.03.2019 11:44, Ole пишет:
> Hello,
>
> I have done some investigations. I think that there a two different
> problems, so lets focus on the bhyve VM. I can now reproduce the
> behaviour very well. It seems to be connected to the virtio disks.
>
> The disk stack is:
>
> Geli-encryption
> Zpool (mirror)
> Zvol
> virtio
> Zpool
>
> - Hostsystem is FreeBSD 11.2
> - VM is FreeBSD 12.0 (VM-Raw image + additional disk for zpool)
> - VM is controlled by vm-bhyve
> - inside the VM there are 5 to 10 running jails (managed with iocage)
>
> If I start the Bhyve VM and let the Backups run (~10 operations per
> hour) the Zpool inside the VM will crash after 1 to 2 days.
>
> If I change the Disk from irtio-blk to ahci-hd, the VM keeps stable.
>
> regards
> Ole
>
> Tue, 19 Feb 2019 10:17:17 +0100 - Ole <ole at free.de>:
>
>> Hi,
>>
>> ok now I got a again unkillable ZFS process. It is only one 'zfs send'
>> command. Any Idea how to kill this process without powering off the
>> machine?
>>
>> oot at jails1:/usr/home/admin # ps aux | grep 'zfs send'
>> root      17617   0.0  0.0  12944  3856  -  Is   Sat04       0:00.00
>> sudo zfs send -e -I
>> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf at 2019-
>> root      17618   0.0  0.0  12980  4036  -  D    Sat04       0:00.01
>> zfs send -e -I
>> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf at 2019-02-16
>> root      19299   0.0  0.0  11320  2588  3  S+   09:53       0:00.00
>> grep zfs send root at jails1:/usr/home/admin # kill -9 17618
>> root at jails1:/usr/home/admin # ps aux | grep 'zfs send' root
>> 17617   0.0  0.0  12944  3856  -  Is   Sat04       0:00.00 sudo zfs
>> send -e -I
>> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf at 2019-
>> root      17618   0.0  0.0  12980  4036  -  D    Sat04       0:00.01
>> zfs send -e -I
>> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf at 2019-02-16
>> root      19304   0.0  0.0  11320  2588  3  S+   09:53       0:00.00
>> grep zfs send
>>
>> It is a FreeBSD 12.0 VM-Image running in a Bhyve VM. There is basicly
>> only py36-iocage installed, and there are 7 running Jails.
>>
>> There is 30G RAM and sysctl vfs.zfs.arc_max ist set to 20G. It seems
>> that the whole zpool is in some kind of deadlock. All Jails are
>> crashed, unkillable and I can not run any command inside.
>>
>> regards
>> Ole
>>
>>
>> Fri, 15 Feb 2019 11:34:23 +0100 - Ole <ole at free.de>:
>>
>>> Hi,
>>>
>>> I observed that FreeBSD Systems with ZFS will run into a deadlock if
>>> there are many parallel zfs send/receive/snapshot processes.
>>>
>>> I observed this on bare metal and virtual machines with FreeBSD 11.2
>>> and 12.0. With RAM from 20 to 64G.
>>>
>>> If the system is also on ZFS the whole system crashes. With only
>>> jails on ZFS they freeze, but the Host system stays stable. But you
>>> can't kill -9 the zfs processes. Only a poweroff stops the machine.
>>>
>>> On a FreeBSD 12.0 VM (bhyve), 30G RAM, 5 CPUs, about 30 zfs
>>> operations, mostly send and receive will crash the system.
>>>
>>> There is no heavy load on the machine:
>>>
>>> # top | head -8
>>> last pid: 91503;  load averages:  0.34,  0.31,  0.29  up 0+22:50:47
>>> 11:24:00 536 processes: 1 running, 529 sleeping, 6 zombie
>>> CPU:  0.9% user,  0.0% nice,  1.5% system,  0.2% interrupt, 97.4%
>>> idle Mem: 165M Active, 872M Inact, 19G Wired, 264M Buf, 9309M Free
>>> ARC: 11G Total, 2450M MFU, 7031M MRU, 216M Anon, 174M Header, 1029M
>>> Other 8423M Compressed, 15G Uncompressed, 1.88:1 Ratio
>>> Swap: 1024M Total, 1024M Free
>>>
>>> I wonder if this is a BUG or normal behaviour. I could live with a
>>> limited amount of parallel ZFS operation, but I don't want the whole
>>> system to crash.
>>>
>>> Reducing the vfs.zfs.arc_max wont help.
>>>
>>> Any Idea to handle with this?
>>>
>>> regards
>>> Ole


More information about the freebsd-questions mailing list