add BIO_NORETRY flag, implement support in ata_da, use in ZFS vdev_geom

Fri Nov 24 14:58:05 UTC 2017

> On Nov 24, 2017, at 6:34 AM, Andriy Gapon <avg at FreeBSD.org> wrote:
> 
> On 24/11/2017 15:08, Warner Losh wrote:
>> 
>> 
>> On Fri, Nov 24, 2017 at 3:30 AM, Andriy Gapon <avg at freebsd.org
>> <mailto:avg at freebsd.org>> wrote:
>> 
>> 
>>    https://reviews.freebsd.org/D13224 <https://reviews.freebsd.org/D13224>
>> 
>>    Anyone interested is welcome to join the review.
>> 
>> 
>> I think it's a really bad idea. It introduces a 'one-size-fits-all' notion of
>> QoS that seems misguided. It conflates a shorter timeout with don't retry. And
>> why is retrying bad? It seems more a notion of 'fail fast' or so other concept.
>> There's so many other ways you'd want to use it. And it uses the same return
>> code (EIO) to mean something new. It's generally meant 'The lower layers have
>> retried this, and it failed, do not submit it again as it will not succeed' with
>> 'I gave it a half-assed attempt, and that failed, but resubmission might work'.
>> This breaks a number of assumptions in the BUF/BIO layer as well as parts of CAM
>> even more than they are broken now.
>> 
>> So let's step back a bit: what problem is it trying to solve?
> 
> A simple example.  I have a mirror, I issue a read to one of its members.  Let's
> assume there is some trouble with that particular block on that particular disk.
> The disk may spend a lot of time trying to read it and would still fail.  With
> the current defaults I would wait 5x that time to finally get the error back.
> Then I go to another mirror member and get my data from there.

There are many RAID stacks that already solve this problem by having a policy
of always reading all disk members for every transaction, and throwing away the
sub-transactions that arrive late.  It’s not a policy that is always desired, but it
serves a useful purpose for low-latency needs.

> IMO, this is not optimal.  I'd rather pass BIO_NORETRY to the first read, get
> the error back sooner and try the other disk sooner.  Only if I know that there
> are no other copies to try, then I would use the normal read with all the retrying.
> 

I agree with Warner that what you are proposing is not correct.  It weakens the
contract between the disk layer and the upper layers, making it less clear who is
responsible for retries and less clear what “EIO” means.  That contract is already
weak due to poor design decisions in VFS-BIO and GEOM, and Warner and I
are working on a plan to fix that.  

Scott