[Bug 253954] kernel: g_access(958): provider da8 has error 6 set
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Mon, 13 Jun 2022 20:20:06 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=253954
jnaughto@ee.ryerson.ca changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |jnaughto@ee.ryerson.ca
--- Comment #4 from jnaughto@ee.ryerson.ca ---
Any update on this bug. I just experienced the exact same issue. I have 8
disks (all SATA) connected to a Freebsd 12.3 system. The ZFS pool is setup as
a raidz3. Got in today found one drive was "REMOVED"
# zpool status pool
pool: pool
state: DEGRADED
status: One or more devices has been removed by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: scrub repaired 0 in 0 days 02:32:26 with 0 errors on Sat Jun 11
05:32:26 2022
config:
NAME STATE READ WRITE CKSUM
pool DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
ada4 ONLINE 0 0 0
8936423309855741075 REMOVED 0 0 0 was /dev/ada5
ada6 ONLINE 0 0 0
ada7 ONLINE 0 0 0
I assumed that the drive had died and pulled it. I put a new drive in place
and attempted to replace it:
# zpool replace pool 8936423309855741075 ada5
cannot replace 8936423309855741075 with ada5: no such pool or dataset
It seems that the old drive somehow is still remembered by the system. I dug
through the logs to find the following occurring when the new drive is inserted
into the system:
Jun 13 13:03:15 server kernel: cam_periph_alloc: attempt to re-allocate valid
device ada5 rejected flags 0x118 refcount 1
Jun 13 13:03:15 server kernel: adaasync: Unable to attach to new device due to
status 0x6
Jun 13 13:04:23 server kernel: g_access(961): provider ada5 has error 6 set
Did a reboot without the new drive in place. On reboot the output of the pool
did look somewhat different:
# zpool status pool
pool: pool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-4J
scan: scrub repaired 0 in 0 days 02:32:26 with 0 errors on Sat Jun 11
05:32:26 2022
config:
NAME STATE READ WRITE CKSUM
pool DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
ada0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
ada4 ONLINE 0 0 0
8936423309855741075 FAULTED 0 0 0 was /dev/ada5
ada5 ONLINE 0 0 0
diskid/DISK-Z1W4HPXX ONLINE 0 0 0
errors: No known data errors
I assumed this was due to the fact that there was one less drive attached and
the system assigned new adaX values to each drive. At this point when I
inserted the new drive the new drive appeared as an ada9. So I re-issued the
zpool replace command but now with ada9. Though it did take about 3mins before
the zpool replace command responded back (which really concerned me). Yet the
server has quite a few users accessing the filesystem so I thought as long as
the new drive was re-silvering I would be fine....
I do a weekly scrub of the pool and I believe the error crept up after the
scub. at 11am today the logs showed the following response:
Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): FLUSHCACHE48. ACB:
ea 00 00 00 00 40 00 00 00 00 00 00
Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status: Command
timeout
Jun 13 11:29:15 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Retrying command, 0
more tries remain
Jun 13 11:30:35 172.16.20.66 kernel: ahcich5: Timeout on slot 5 port 0
Jun 13 11:30:35 172.16.20.66 kernel: ahcich5: is 00000000 cs 00000060 ss
00000000 rs 00000060 tfd c0 serr 00000000 cmd 0004c517
Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): FLUSHCACHE48. ACB:
ea 00 00 00 00 40 00 00 00 00 00 00
Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status: Command
timeout
Jun 13 11:30:35 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Retrying command, 0
more tries remain
Jun 13 11:31:08 172.16.20.66 kernel: ahcich5: AHCI reset: device not ready
after 31000ms (tfd = 00000080)
At 11:39 I believe the following log entries are of note:
Jun 13 11:39:45 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): CAM status:
Unconditionally Re-queue Request
Jun 13 11:39:45 172.16.20.66 kernel: (ada5:ahcich5:0:0:0): Error 5, Periph was
invalidated
Jun 13 11:39:45 172.16.20.66 ZFS[92964]: vdev state changed,
pool_guid=$5100646062824685774 vdev_guid=$8936423309855741075
Jun 13 11:39:45 172.16.20.66 ZFS[92966]: vdev is removed,
pool_guid=$5100646062824685774 vdev_guid=$8936423309855741075
Jun 13 11:39:46 172.16.20.66 kernel: g_access(961): provider ada5 has error 6
set
Jun 13 11:39:47 reactor syslogd: last message repeated 1 times
Jun 13 11:39:47 172.16.20.66 syslogd: last message repeated 1 times
Jun 13 11:39:47 172.16.20.66 kernel: ZFS WARNING: Unable to attach to ada5.
Any idea on what was the issue?
--
You are receiving this mail because:
You are the assignee for the bug.