ZFS import panic with r219703

Wed Mar 16 23:17:14 UTC 2011

On Wed, Mar 16, 2011 at 4:03 PM, Freddie Cash <fjwcash at gmail.com> wrote:
> Anytime I try to import my pool built using 24x HAST devices, I get
> the following message, and the system reboots:
>
> panic: solaris assert: dmu_free_range(os, smo->smo_object, 0, -1ULL,
> tx) == 0, file:
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/space_map.c,
> line: 484
>
> Everything runs nicely if I don't import the pool.
>
> Doing a "zpool import" shows that one of the HAST devices is FAULTED
> "corrupted data".
>
> Haven't tried anything to remove/replace the faulted device, just
> wanted to see if anyone knew what the above error meant.
>
> Pool was created using r219523 and successfully copied over 1 TB of
> data from another ZFS system.  Had some issues with gptboot this
> morning and the system locking up and rebooting a bunch, and now the
> pool won't import.

Along with this ZFS import issue, it seems that hastd doesn't like it
when you fire up 24 hast devices all at once (via for loop), each with
over 100 MB of dirty data in it.  hast dumps core, kernel panics, and
system reboots.

If I do 1 hast device every 2 seconds (or however long it takes to
manually type "hastctl role primary disk-a1") then it starts up fine.

So, I can now panic my 9-CURRENT system by either:
  - starting 24 hast devices at once, or
  - importing a ZFS pool made up of those 24 hast devices, with 1
corrupted device

Isn't testing fun?  :)

I have a bunch of vmcore files from the hast crashes, not really sure
what to do with them, though.

-- 
Freddie Cash
fjwcash at gmail.com