Re: unusual ZFS issue

From: Pete Wright <pete_at_nomadlogic.org>
Date: Thu, 14 Dec 2023 22:10:30 UTC

On 12/14/23 2:05 PM, Lexi Winter wrote:
> On 14 Dec 2023, at 22:02, Pete Wright <pete@nomadlogic.org> wrote:
>> On Thu, Dec 14, 2023 at 09:17:06PM +0000, Lexi Winter wrote:
>>> hi list,
>>>
>>> i’ve just hit this ZFS error:
>>>
>>> # zfs list -rt snapshot data/vm/media/disk1
>>> cannot iterate filesystems: I/O error
>>
>> hrm, i wonder if you see any errors in dmesg or /var/log/messages about a
>> device failing?
> 
> nothing that looks relevant in the last few days (the problem appeared last night, Dec 13th):
> 
> Dec 11 15:44:21 hemlock kernel: ix1: link state changed to DOWN
> Dec 11 15:44:21 hemlock kernel: ix1.107: link state changed to DOWN
> Dec 11 15:44:35 hemlock kernel: ix1: link state changed to UP
> Dec 11 15:44:35 hemlock kernel: ix1.107: link state changed to UP
> Dec 11 15:44:47 hemlock kernel: nfsrv_cache_session: no session IPaddr=2001:8b0:aab5:ffff::2, check NFS clients for unique /etc/hostid's
> Dec 11 15:44:47 hemlock syslogd: last message repeated 1 times
> Dec 11 17:00:48 hemlock kernel: tcp_vnet_init: WARNING: unable to initialise TCP stats
> Dec 11 17:00:48 hemlock kernel: lo0: link state changed to UP
> Dec 12 06:17:23 hemlock ntpd[25836]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): will expire in less than 16 days
> Dec 13 06:17:23 hemlock ntpd[25836]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): will expire in less than 15 days
> Dec 14 06:17:23 hemlock ntpd[25836]: leapsecond file ('/var/db/ntpd.leap-seconds.list'): will expire in less than 14 days
> Dec 14 16:30:12 hemlock smbd[98264]: [2023/12/14 16:30:12.404883,  0] ../../source3/smbd/server.c:1741(main)
> Dec 14 16:30:12 hemlock smbd[98264]:   smbd version 4.16.11 started.
> Dec 14 16:30:12 hemlock smbd[98264]:   Copyright Andrew Tridgell and the Samba Team 1992-2022
> 
> i’ve also checked the disks with smartctl and i didn’t see any errors there.  (a couple of devices have corrected read errors, but that’s expected given their age - and if it *was* a disk error i’d expect it to show up as a checksum error).
> 

dang, was hoping something obvious would pop up there or with smartctl. 
hopefully others here have some ideas about trying to find the root 
cause before a restart.

-pete

-- 
Pete Wright
pete@nomadlogic.org