Repeatable panic on ZFS filesystem (used for backups); 11.0-STABLE

Mon Oct 17 23:31:57 UTC 2016


On 17/10/2016 22:50, Karl Denninger wrote:
> I will make some effort on the sandbox machine to see if I can come up
> with a way to replicate this.  I do have plenty of spare larger drives
> laying around that used to be in service and were obsolesced due to
> capacity -- but what I don't know if whether the system will misbehave
> if the source is all spinning rust.
>
> In other words:
>
> 1. Root filesystem is mirrored spinning rust (production is mirrored SSDs)
>
> 2. Backup is mirrored spinning rust (of approx the same size)
>
> 3. Set up auto-snapshot exactly as the production system has now (which
> the sandbox is NOT since I don't care about incremental recovery on that
> machine; it's a sandbox!)
>
> 4. Run a bunch of build-somethings (e.g. buildworlds, cross-build for
> the Pi2s I have here, etc) to generate a LOT of filesystem entropy
> across lots of snapshots.
>
> 5. Back that up.
>
> 6. Export the backup pool.
>
> 7. Re-import it and "zfs destroy -r" the backup filesystem.
>
> That is what got me in a reboot loop after the *first* panic; I was
> simply going to destroy the backup filesystem and re-run the backup, but
> as soon as I issued that zfs destroy the machine panic'd and as soon as
> I re-attached it after a reboot it panic'd again.  Repeat until I set
> trim=0.
>
> But... if I CAN replicate it that still shouldn't be happening, and the
> system should *certainly* survive attempting to TRIM on a vdev that
> doesn't support TRIMs, even if the removal is for a large amount of
> space and/or files on the target, without blowing up.
>
> BTW I bet it isn't that rare -- if you're taking timed snapshots on an
> active filesystem (with lots of entropy) and then make the mistake of
> trying to remove those snapshots (as is the case with a zfs destroy -r
> or a zfs recv of an incremental copy that attempts to sync against a
> source) on a pool that has been imported before the system realizes that
> TRIM is unavailable on those vdevs.
>
> Noting this:
>
>      Yes need to find some time to have a look at it, but given how rare
>      this is and with TRIM being re-implemented upstream in a totally
>      different manor I'm reticent to spend any real time on it.
>
> What's in-process in this regard, if you happen to have a reference?
Looks like it may be still in review: https://reviews.csiden.org/r/263/