lockup during zfs destroy

Thu Oct 5 05:13:31 UTC 2017

Setting vfs.zfs.free_max_blocks to 20k has not helped unfortunately. I was
able to get a small amount of debug out though. Any thoughts on how I can:

- get more detailed debug on the progress of this operation, or whether
progress is being made at all each time I reboot and start over after a
freeze
- configure my way out of this issue?

# dtrace -q -n 'zfs-dbgmsg{printf("%s\n", stringof(arg0))}'
txg 34628587 open pool version 28; software version 5000/5; uts host
10.3-RELEASE 1003000 amd64
txg 34628587 destroy begin tank/temp (id 3680)
txg 34628588 destroy tank/temp (id 3680)

On Wed, Oct 4, 2017 at 10:58 AM, Freddie Cash <fjwcash at gmail.com> wrote:

> On Wed, Oct 4, 2017 at 9:27 AM, Freddie Cash <fjwcash at gmail.com> wrote:
>
>> On Wed, Oct 4, 2017 at 9:15 AM, javocado <javocado at gmail.com> wrote:
>>
>>> I am trying to destroy a dense, large filesystem and it's not going well.
>>>
>>> Details:
>>> - zpool is a raidz3 with 3 x 12 drive vdevs.
>>> - target filesystem to be destroyed is ~2T with ~63M inodes.
>>> - OS: FreeBSD 10.3amd with 192 GB of RAM.
>>> - 120 GB of swap (90GB recently added as swap-on-disk)
>>>
>>
>> Do you have dedupe enabled on any filesystems in the pool?  Or was it
>> enabled at any point in the past?
>>
>> This is a common occurrence when destroying large filesystems or lots of
>> filesystems/snapshots on pools that have/had dedupe enabled and there's not
>> enough RAM/L2ARC to contain the DDT.  The system runs out of usable wired
>> memory and locks up.  Adding more RAM and/or being patient with the
>> boot-wait-lockup-repeat cycle will (usually) eventually allow it to finish
>> the destroy.
>>
>> There was a loader.conf tunable (or sysctl) added in the 10.x series that
>> mitigates this by limiting the number of delete operations that occur in a
>> transaction group, but I forget the details on it.
>>
>> Not sure if this affects pools that never had dedupe enabled or not.
>>
>> (We used to suffer through this at least once a year until we enabled a
>> delete-oldest-snapshot-before-running-backups process to limit the
>> number of snapshots.)
>>
>
> Found it.  You can set vfs.zfs.free_max_blocks in /etc/sysctl.conf.  That
> will limit the number to-be-freed blocks in a single transaction group.
> You can play with that number until you find a value that won't run the
> system out of kernel memory trying to free all those blocks in a single
> transaction.
>
> On our problem server, running dedupe with only 64 GB of RAM for a 53 TB
> pool, we set it to 200,000 blocks:
>
> vfs.zfs.free_max_blocks=200000
>
> --
> Freddie Cash
> fjwcash at gmail.com
>