Re: ASC/ASCQ Review
- In reply to: Douglas Gilbert : "Re: ASC/ASCQ Review"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 21 Jul 2023 03:26:07 UTC
On Thu, Jul 20, 2023, 9:18 PM Douglas Gilbert <dgilbert@interlog.com> wrote:
> On 2023-07-19 11:41, Warner Losh wrote:
> > btw, it also occurs to me that if I do add a 'secondary' table, then you
> could
> > use it to generate a unique errno and experiment
> > with that w/o affecting the main code until that stuff was mature.
> >
> > I'm not sure I'll do that now, since I've found maybe 10 asc/ascq pairs
> that I'd
> > like to tag as 'if trying harder, retry, otherwise fail' since re-retry
> needs
> > have changed a lot since cam was written in the late 90s and at least
> some of
> > the asc/ascq pairs I'm looking at haven't changed since the initial
> import, but
> > that's based on a tiny sampling of the data I have and is preliminary at
> best. I
> > may just change it to reflect modern usage.
>
> Hi,
> If you are looking for up-to-date [20230325] asc/ascq tables in C you could
> borrow mine at https://github.com/doug-gilbert/sg3_utils in
> lib/sg_lib_data.c
> starting at line 745 .
> In testing/sg_chk_asc.c is a small test program for checking that the
> table in
> sg_lib_data.c agrees with the file that T10 supplies:
> https://www.t10.org/lists/asc-num.txt
Thanks for the pointer. I'd already updated CAM's tables for that...
what I'm doing now is to make sure CAM's reactions to the asc/ascq is good
for the modern drives... it's a good idea though to create a program for
our table to match...
Warner
> Doug Gilbert
>
> > On Fri, Jul 14, 2023 at 5:34 PM Warner Losh <imp@bsdimp.com
> > <mailto:imp@bsdimp.com>> wrote:
> >
> >
> >
> > On Fri, Jul 14, 2023 at 12:31 PM Alan Somers <asomers@freebsd.org
> > <mailto:asomers@freebsd.org>> wrote:
> >
> > On Fri, Jul 14, 2023 at 11:05 AM Warner Losh <imp@bsdimp.com
> > <mailto:imp@bsdimp.com>> wrote:
> > >
> > >
> > >
> > > On Fri, Jul 14, 2023, 11:12 AM Alan Somers <
> asomers@freebsd.org
> > <mailto:asomers@freebsd.org>> wrote:
> > >>
> > >> On Thu, Jul 13, 2023 at 12:14 PM Warner Losh <imp@bsdimp.com
> > <mailto:imp@bsdimp.com>> wrote:
> > >> >
> > >> > Greetings,
> > >> >
> > >> > i've been looking closely at failed drives for $WORK
> lately. I've
> > noticed that a lot of errors that kinda sound like fatal errors
> have
> > SS_RDEF set on them.
> > >> >
> > >> > What's the process for evaluating whether those error
> codes are
> > worth retrying. There are several errors that we seem to be
> seeing
> > (preliminary read of the data) before the drive gives up the
> ghost
> > altogether. For those cases, I'd like to post more specific
> lists.
> > Should I do that here?
> > >> >
> > >> > Independent of that, I may want to have a more aggressive
> 'fail
> > fast' policy than is appropriate for my work load (we have a lot
> of data
> > that's a copy of a copy of a copy, so if we lose it, we don't
> care:
> > we'll just delete any files we can't read and get on with life,
> though I
> > know others will have a more conservative attitude towards data
> that
> > might be precious and unique). I can set the number of retries
> lower, I
> > can do some other hacks for disks that tell the disk to fail
> faster, but
> > I think part of the solution is going to have to be failing for
> some
> > sense-code/ASC/ASCQ tuples that we don't want to fail in
> upstream or the
> > general case. I was thinking of identifying those and creating a
> 'global
> > quirk table' that gets applied after the drive-specific quirk
> table that
> > would let $WORK override the defaults, while letting others keep
> the
> > current behavior. IMHO, it would be better to have these
> separate rather
> > than in the global data for tracking upstream...
> > >> >
> > >> > Is that clear, or should I give concrete examples?
> > >> >
> > >> > Comments?
> > >> >
> > >> > Warner
> > >>
> > >> Basically, you want to change the retry counts for certain
> ASC/ASCQ
> > >> codes only, on a site-by-site basis? That sounds
> reasonable. Would
> > >> it be configurable at runtime or only at build time?
> > >
> > >
> > > I'd like to change the default actions. But maybe we just do
> that for
> > everyone and assume modern drives...
> > >
> > >> Also, I've been thinking lately that it would be real nice
> if READ
> > >> UNRECOVERABLE could be translated to EINTEGRITY instead of
> EIO. That
> > >> would let consumers know that retries are pointless, but
> that the data
> > >> is probably healable.
> > >
> > >
> > > Unlikely, unless you've tuned things to not try for long at
> recovery...
> > >
> > > But regardless... do you have a concrete example of a use
> case?
> > There's a number of places that map any error to EIO. And I'd
> like a use
> > case before we expand the errors the lower layers return...
> > >
> > > Warner
> >
> > My first use-case is a user-space FUSE file system. It only has
> > access to errnos, not ASC/ASCQ codes. If we do as I suggest,
> then it
> > could heal a READ UNRECOVERABLE by rewriting the sector, whereas
> other
> > EIO errors aren't likely to be healed that way.
> >
> >
> > Yea... but READ UNRECOVERABLE is kinda hit or miss...
> >
> > My second use-case is ZFS. zfsd treats checksum errors
> differently
> > from I/O errors. A checksum error normally means that a read
> returned
> > wrong data. But I think that READ UNRECOVERABLE should also
> count.
> > After all, that means that the disk's media returned wrong data
> which
> > was detected by the disk's own EDC/ECC. I've noticed that zfsd
> seems
> > to fault disks too eagerly when their only problem is READ
> > UNRECOVERABLE errors. Mapping it to EINTEGRITY, or even a new
> error
> > code, would let zfsd be tuned better.
> >
> >
> > EINTEGRITY would then mean two different things. UFS returns in when
> > checksums fail for critical filesystem errors. I'm not saying no,
> per se,
> > just that it conflates two different errors.
> >
> > I think both of these use cases would be better served by CAM's
> publishing
> > of the errors to devctl today. Here's some example data from a
> system I'm
> > looking at:
> >
> > system=CAM subsystem=periph type=timeout device=da36 serial="12345"
> > cam_status="0x44b" timeout=30000 CDB="28 00 4e b7 cb a3 00 04 cc 00 "
> > timestamp=1634739729.312068
> > system=CAM subsystem=periph type=timeout device=da36 serial="12345"
> > cam_status="0x44b" timeout=30000 CDB="28 00 20 6b d5 56 00 00 c0 00 "
> > timestamp=1634739729.585541
> > system=CAM subsystem=periph type=error device=da36 serial="12345"
> > cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00
> ad 1a
> > 35 96 00 00 56 00 " timestamp=1641979267.469064
> > system=CAM subsystem=periph type=error device=da36 serial="12345"
> > cam_status="0x4cc" scsi_status=2 scsi_sense="72 03 11 00" CDB="28 00
> ad 1a
> > 35 96 00 01 5e 00 " timestamp=1642252539.693699
> > system=CAM subsystem=periph type=error device=da39 serial="12346"
> > cam_status="0x4cc" scsi_status=2 scsi_sense="72 04 02 00" CDB="2a 00
> 01 2b
> > c8 f6 00 07 81 00 " timestamp=1669603144.090835
> >
> > Here we get the sense key, the asc and the ascq in the scsi_sense
> data (I'm
> > currently looking at expanding this to the entire sense buffer,
> since it
> > includes how hard the drive tried to read the data on media and
> hardware
> > errors). It doesn't include nvme data, but does include ata data
> (I'll have
> > to add that data, now that I've noticed it is missing). With the
> sense data
> > and the CDB you know what kind of error you got, plus what block
> didn't
> > read/write correctly. With the extended sense data, you can find out
> even
> > more details that are sense-key dependent...
> >
> > So I'm unsure that trying to shoehorn our imperfect knowledge of
> what's
> > retriable, fixable, should be written with zeros into the kernel and
> > converting that to a separate errno would give good results, and
> tapping
> > into this stream daemons that want to make more nuanced calls about
> disks
> > might be the better way to go. One of the things I'm planning for
> $WORK is
> > to enable the retry time limit of one of the mode pages so that we
> fail
> > faster and can just delete the file with the 'bad' block that we'd
> get
> > eventually if we allowed the full, default error processing to run,
> but that
> > 'slow path' processing kills performance for all other users of the
> > drive... I'm unsure how well that will work out (and I know I'm
> lucky that
> > I can always recover any data for my application since it's just a
> cache).
> >
> > I'd be interested to hear what others have to say here thought,
> since my
> > focus on this data is through the lense of my rather specialized
> application...
> >
> > Warner
> >
> > P.S. That was generated with this rule if you wanted to play with
> it...
> > You'd have to translate absolute disk blocks to a partition and an
> offset
> > into the filesystem, then give the filesystem a chance to tell you
> what of
> > its data/metadata that block is used for...
> >
> > # Disk errors
> > notify 10 {
> > match "system" "CAM";
> > match "subsystem" "periph";
> > match "device" "[an]?da[0-9]+";
> > action "logger -t diskerr -p daemon.info <
> http://daemon.info> $_
> > timestamp=$timestamp";
> > };
> >
>
>