svn commit: r322258 - head/sys/kern

Alan Somers asomers at freebsd.org
Thu Aug 24 19:43:50 UTC 2017


On Wed, Aug 9, 2017 at 1:05 AM, Bruce Evans <brde at optusnet.com.au> wrote:
> On Tue, 8 Aug 2017, Alan Somers wrote:
>
>> Log:
>>  Make p1003_1b.aio_listio_max a tunable
>>
>>  p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the
>>  sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX.
>>  Its value will be bounded from below by the compile-time constant
>>  AIO_LISTIO_MAX and from above by the compile-time constant
>>  MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue.
>
>
> I don't like tunables or sysctls being limited or silently limiting them.
> This is not done for more important sysctls like kern.maxfilesperproc.
> Lots of silent runtime limiting is done for a few more important limits
> like kern.maxfilesperproc.
>
> The limiting is also buggy:
> - vfs.aio.max_aio_queue isn't a tunable as stated.  It is a r/w sysctl
> - if vfs.aio.max_aio_queue were a r/o tunable, then there would be a
>   solvable ordering problem initializing it before using it
> - since vfs.aio.max_aio_queue is only a sysctl, it is unusable at boot
>   time (and module load time), but it is used there.
>
> Other bugs in vfs.aio.max_aio_queue:
> - its name is too long, yet not very descriptive
> - its name repeats "aio"
> - its name but doesn't say "len", so it looks like it limits a number of
>   queues and not the length of a single queue.  This is backwards relative
>   to the corresponding variable name.  That is missing "aio" when it
>   is needed, but only spells "length" badly as "count".  It is
>   max_queue_count, but should be something like aio_max_qlen.  All
>   sysctl variables should have an aio_ prefix which is removed in the
>   leaf name in the sysctl tree.
> Other names have similar or worse bugs.  Bugs start with parameter names
> MAX_AIO_* instead of AIO_*_MAX, and variable names track this.  There is
> some ambiguity from global vs per-process limits and counts, and the
> noise word "num" is sprinkled to get names which differ without describing
> any difference.
>
>> Modified: head/sys/kern/vfs_aio.c
>>
>> ==============================================================================
>> --- head/sys/kern/vfs_aio.c     Tue Aug  8 16:06:16 2017        (r322257)
>> +++ head/sys/kern/vfs_aio.c     Tue Aug  8 16:14:31 2017        (r322258)
>> @@ -102,6 +102,7 @@ static uint64_t jobseqno;
>> #endif
>>
>> FEATURE(aio, "Asynchronous I/O");
>> +SYSCTL_DECL(_p1003_1b);
>>
>> static MALLOC_DEFINE(M_LIO, "lio", "listio aio control block list");
>>
>> @@ -168,6 +169,11 @@ static int max_buf_aio = MAX_BUF_AIO;
>> SYSCTL_INT(_vfs_aio, OID_AUTO, max_buf_aio, CTLFLAG_RW, &max_buf_aio, 0,
>>     "Maximum buf aio requests per process (stored in the process)");
>>
>> +static int aio_listio_max = AIO_LISTIO_MAX;
>> +SYSCTL_INT(_p1003_1b, CTL_P1003_1B_AIO_LISTIO_MAX, aio_listio_max,
>> +    CTLFLAG_RDTUN | CTLFLAG_CAPRD, &aio_listio_max, 0,
>> +    "Maximum aio requests for a single lio_listio call");
>> +
>
>
> The POSIX namespace is missing all of the bugs described above.  It has
> aio first (except, for historical reasons it unfortunately has p10031b
> instead of vfs.aio for the sysctl prefix), and max last.
>
>> #ifdef COMPAT_FREEBSD6
>> typedef struct oaiocb {
>>         int     aio_fildes;             /* File descriptor */
>> @@ -388,6 +394,11 @@ static int
>> aio_onceonly(void)
>> {
>>
>> +       if (aio_listio_max < AIO_LISTIO_MAX)
>> +               aio_listio_max = AIO_LISTIO_MAX;
>> +       if (aio_listio_max > MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count))
>> +               aio_listio_max = MIN(MAX_AIO_QUEUE_PER_PROC,
>> max_queue_count);
>> +
>
>
> max_queue_count is not initialized here, except to a default at compile
> time.  Except for module (re)load, it always has the default value
> MAX_AIO_QUEUE = 1024
>   (this 1024 is bogusly ifdefed.  MAX_AIO_QUEUE is not defined in any
>   header file and is not a supported option, and the tunable makes the
>   option less needed than before).
> MAX_AIO_QUEUE_PER_PROC = 256, modulo a similar unsupported option.
> Thus MIN(MAX_AIO_QUEUE_PER_PROC, max_queue_count) is an obfuscated spelling
> of 256.
>
> MAX_AIO_QUEUE_PER_PROC is also just the default for a variable
> (max_aio_queue_per_proc), so using it here has the same bugs except for
> the obfuscation.
>
> Using MIN() instead of imin() is a style bug.
>
> OTOH, AIO_LISTIO_MAX is defined (as 16) in a header file, and is not
> bogusly ifdefed.  It is 1 of 3 aio limits specified by POSIX.
>
> The checking would be buggy if it were done later, since max_queue_count
> is not checked and limiting to it can corrupt aio_listio_max (when it is
> negative or just too small).
>
> The compile-time definition of AIO_LISTIO_MAX seems to be broken.  I think
> POSIX species that AIO_LISTIO_MAX shall not be defined if the value of
> {AIO_LISTIO_MAX} varies at runtime, but it is still defined.  FreeBSD
> has this bug for many other sysconf() variables, e.g., {OPEN_MAX}.
> Perhaps AIO_LISTIO_MAX is easier to fix since it is not hard-coded as
> often as OPEN_MAX.

What you describe is Linux's behavior, but the POSIX requirement is a
bit more general.  All POSIX says is that "The value returned [by
sysconf(3)] shall not be more restrictive than the corresponding value
described to the application when it was compiled with the
implementation's <limits.h> or <unistd.h>".

>
> AIO_LISTIO_MAX is now just the FreeBSD default, but the first check in
> the above restricts {AIO_LISTIO_MAX} to this default (16) instead of to
> the POSIX limit of _POSIX_IO_LISTIO_MAX (2).  Relaxing this check would
> cause more problems with hard-coded AIO_LISTIO_MAX.
>
> If the second check in the above were done at runtime so that it worked
> for changed max_queue_count, then it would reduce aio_listio_max below
> the POSIX limit of 2 slightly before it reduces to garbage zero and negative
> values.  The first check should be large, or if you want to give a
> preference
> for >= the old limit of 16 and a hard limit of the POSIX limit of 2, check
> keep checking the old limit first and the POSIX limit last.
>
>>         exit_tag = EVENTHANDLER_REGISTER(process_exit, aio_proc_rundown,
>> NULL,
>>             EVENTHANDLER_PRI_ANY);
>>         exec_tag = EVENTHANDLER_REGISTER(process_exec,
>> aio_proc_rundown_exec,
>> @@ -405,14 +416,13 @@ aio_onceonly(void)
>>             NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
>>         aiocb_zone = uma_zcreate("AIOCB", sizeof(struct kaiocb), NULL,
>> NULL,
>>             NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
>> -       aiol_zone = uma_zcreate("AIOL", AIO_LISTIO_MAX*sizeof(intptr_t) ,
>> NULL,
>> -           NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
>> +       aiol_zone = uma_zcreate("AIOL", aio_listio_max * sizeof(intptr_t)
>> ,
>> +           NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, UMA_ZONE_NOFREE);
>
>
> Allocations like this are hard to adjust later (I forget if the size limit
> is hard or soft, but it should be hard).  The user must be allowed to set
> aio_listio_max to value larger than the default, so as to prepare for
> expansion of the other limits later.  However, this doesn't work, since
> it leaves the other limits inconsistent until they are expanded.
>
> It seems best to adjust all the other limits based on the aio_listio_max
> tunable.  Usually this is not set.  Use the old parameters then, but remove
> all the macros and compile-time initializers for them.  Do onceonly
> initialization like
>
>         if (aio_listio_max == 16) {
>                 max_queue_count = 1024; /* historical MAX_AIO_QUEUE */
>                 ...
>         }
>
> When aio_listio_max != 16, calculate the parameters somehow.  I don't know
> how to do this.  At least make them consistent.
>
> After initialization, the other parameters are still restricted by the
> choice for aio_listio_max.  Leave it to the user to not choose bad
> combinations.
>
> The aio sysctls seem to already break many of the POSIX sysctls.  E.g.,
> MAX_AIO_QUEUE is used as the default for the r/w sysctl
> vfs.aio.max_aio_queue and as the value for the sysctl r/o sysctl
> p1003_1b.aio_max.  Any change to the first sysctl breaks the second
> sysctl.
>
> The 3 POSIX (2001) aio sysctls thus have quite different invariance
> bugs:
> - {AIO_MAX}: AIO_MAX is correctly not defined.  {AIO_MAX} is variable
>   by a low-level sysctl but the p1003 sysctl used to report {AIO_MAX}
>   doesn't see the change.
> - {AIO_PRIO_DELTA_MAX}: no low-level sysctl or variable to break it
>   (value is always 0)
> - {AIO_LISTIO_MAX}: was constant until this commit.  There is not
>   low-level sysctl for it, and the low-level variable for it is
>   correctly exported, but AIO_LISTIO_MAX was incorrectly defined
>   so it is hard to change now.  {AIO_LISTIO_MAX} is inconsistent
>   with the macro whenever the tunable is used to change its default.
>
> Bruce

I dug deeper and found that there wasn't any good reason for the
aio_listio_max limit to exist in the first place.  This DR eliminates
it, which I think will satisfy most of your concerns.
https://reviews.freebsd.org/D12120

-Alan


More information about the svn-src-all mailing list