ZFS pools in "trouble"
Andriy Gapon
avg at FreeBSD.org
Sat Mar 14 15:14:17 UTC 2020
On 14/03/2020 13:00, Willem Jan Withagen wrote:
> On 27-2-2020 09:11, Andriy Gapon wrote:
>> On 26/02/2020 19:09, Willem Jan Withagen wrote:
>>> Hi,
>>>
>>> I'm using my pools in perhaps a rather awkward way as underlying storage for my
>>> ceph cluster:
>>> 1 disk per pool, with log and cache on SSD
>>>
>>> For one reason or another one of the servers has crashed ad does not really want
>>> to read several of the pools:
>>> ----
>>> pool: osd_2
>>> state: UNAVAIL
>>> Assertion failed: (reason == ZPOOL_STATUS_OK), file
>>> /usr/src/cddl/contrib/opensolaris/cmd/zpool/zpool_main.c, line 5098.
>>> Abort (core dumped)
>>> ----
>>>
>>> The code there is like:
>>> ----
>>> default:
>>> /*
>>> * The remaining errors can't actually be generated, yet.
>>> */
>>> assert(reason == ZPOOL_STATUS_OK);
>>>
>>> ----
>>> And this on already 3 disks.
>>> Running:
>>> FreeBSD 12.1-STABLE (GENERIC) #0 r355208M: Fri Nov 29 10:43:47 CET 2019
>>>
>>> Now this is a test cluster, so no harm there in matters of data loss.
>>> And the ceph cluster probably can rebuild everything if I do not lose too many
>>> disk.
>>>
>>> But the problem also lies in the fact that not all disk are recognized by the
>>> kernel, and not all disk end up mounted. So I need to remove a pool first to get
>>> more disks online.
>>>
>>> Is there anything I can do the get them back online?
>>> Or is this a lost cause?
>> Depends on what 'reason' is.
>> I mean the value of the variable.
>
> I ran into the same problem. Even though I deleted the zpool in error.
>
> Ao I augmented this code with a pringtf
>
> Error: Reason not found: 5
It seems that 5 is ZPOOL_STATUS_BAD_GUID_SUM and there is a discrepancy between
what the code in status_callback() expects and what actually happens.
Looks like check_status() can actually return ZPOOL_STATUS_BAD_GUID_SUM:
/*
* Check that the config is complete.
*/
if (vs->vs_state == VDEV_STATE_CANT_OPEN &&
vs->vs_aux == VDEV_AUX_BAD_GUID_SUM)
return (ZPOOL_STATUS_BAD_GUID_SUM);
I think that VDEV_AUX_BAD_GUID_SUM typically means that a device is missing from
the pool. E.g., a log device. Or there is some other discrepancy between
expected pool vdevs and found pool vdevs.
--
Andriy Gapon
More information about the freebsd-fs
mailing list