ZFS pools in "trouble"

Andriy Gapon avg at FreeBSD.org
Sat Mar 14 15:14:17 UTC 2020


On 14/03/2020 13:00, Willem Jan Withagen wrote:
> On 27-2-2020 09:11, Andriy Gapon wrote:
>> On 26/02/2020 19:09, Willem Jan Withagen wrote:
>>> Hi,
>>>
>>> I'm using my pools in perhaps a rather awkward way as underlying storage for my
>>> ceph cluster:
>>>      1 disk per pool, with log and cache on SSD
>>>
>>> For one reason or another one of the servers has crashed ad does not really want
>>> to read several of the pools:
>>> ----
>>>    pool: osd_2
>>>   state: UNAVAIL
>>> Assertion failed: (reason == ZPOOL_STATUS_OK), file
>>> /usr/src/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c, line 5098.
>>> Abort (core dumped)
>>> ----
>>>
>>> The code there is like:
>>> ----
>>>          default:
>>>                  /*
>>>                   * The remaining errors can't actually be generated, yet.
>>>                   */
>>>                  assert(reason == ZPOOL_STATUS_OK);
>>>
>>> ----
>>> And this on already 3 disks.
>>> Running:
>>> FreeBSD 12.1-STABLE (GENERIC) #0 r355208M: Fri Nov 29 10:43:47 CET 2019
>>>
>>> Now this is a test cluster, so no harm there in matters of data loss.
>>> And the ceph cluster probably can rebuild everything if I do not lose too many
>>> disk.
>>>
>>> But the problem also lies in the fact that not all disk are recognized by the
>>> kernel, and not all disk end up mounted. So I need to remove a pool first to get
>>> more disks online.
>>>
>>> Is there anything I can do the get them back online?
>>> Or is this a lost cause?
>> Depends on what 'reason' is.
>> I mean the value of the variable.
> 
> I ran into the same problem. Even though I deleted the zpool in error.
> 
> Ao I augmented this code with a pringtf
> 
> Error: Reason not found: 5

It seems that 5 is ZPOOL_STATUS_BAD_GUID_SUM and there is a discrepancy between
what the code in status_callback() expects and what actually happens.
Looks like check_status() can actually return ZPOOL_STATUS_BAD_GUID_SUM:
        /*
         * Check that the config is complete.
         */
        if (vs->vs_state == VDEV_STATE_CANT_OPEN &&
            vs->vs_aux == VDEV_AUX_BAD_GUID_SUM)
                return (ZPOOL_STATUS_BAD_GUID_SUM);

I think that VDEV_AUX_BAD_GUID_SUM typically means that a device is missing from
the pool.  E.g., a log device.  Or there is some other discrepancy between
expected pool vdevs and found pool vdevs.

-- 
Andriy Gapon


More information about the freebsd-fs mailing list