kern/81951: [patch] linux emulation: getpriority() returns incorrect value

Bruce Evans bde at zeta.org.au
Sat Jun 11 18:37:29 GMT 2005


On Thu, 9 Jun 2005, Andriy Gapon wrote:

> on 09.06.2005 16:17 Bruce Evans said the following:
>>> on 08.06.2005 23:49 Maxim Sobolev said the following:
>>>> Committed, thanks!
>>>>
>>>> I wonder if the setpriority(2) needs the same cure. Please clarify and
>>>> let me know. I'll keep the PR open till your reply.
>>
>> I wonder why committers commit patches without fully understanding them.
>
> I wonder if you fully understood the patch, the issue and the
> getriority/setpriority.

I thought I did, but I read POSIX partly backwards.

>> POSIX specifies that the non-error range of values returned by
>> getpriority()
>> is [0, 2*{NZERO}-1]; -1 is the error indicator.  Applications must subtract
>> NZERO to get the actual priority value.

> I think you have misread POSIX specification and you are confusing two
> things: (1) priority - priority inside the blackbox that schedules
> processes versus values that should be passed to setpriotiy() and
> returned from getpriority(); (2) syscall internal implementation  versus
> user-visible libc function.

Priority in the black bix is td->td_priority.  p->p_nice is supposed to
be the user-visible priority offset by NZERO in freeBSD, and it is, but
things are made confusing by "fixing" the historical value of NZERO so
that NZERO is 0.  Biases of 0 are subtle and POSIX has made the NZERO = 0
bias by wrong over-specifying the behaviour as the historical behaviour.

> Regaridng #1, here's a direct quote:
> http://www.opengroup.org/onlinepubs/009695399/functions/getpriority.html
>
> "Upon successful completion, getpriority() shall return an integer in
> the range -{NZERO} to {NZERO}-1. Otherwise, -1 shall be returned and
> errno set to indicate the error."
> Also:
> "The getpriority() and setpriority() functions work with an offset nice
> value (nice value -{NZERO}). The nice value is in the range [0,2*{NZERO}
> -1], while the return value for getpriority() and the third parameter
> for setpriority() are in the range [-{NZERO},{NZERO} -1]."

This is the part that I misread.  I only saw the "Also" part and I read
it backwards as specifying Linux-like behaviour to avoid the in-band
ierror indicator.

> So this is a difference between priority as it is seen in user-land
> (above libc layer) and priority inside the POSIX blackbox of OS (the one
> in [0,2*{NZERO} -1] range).

It is a bug in POSIX for POSIX to specify the black box.  The FreeBSD
black box doesn't actually use this range, and applications and users
hardly notice since they mostly see the adjusted priorities (with
default priority 0 instead of NZERO).

> My understanding is that FreeBSD and Linux are very close to POSIXly
> correct implemetations with NZERO=20. In fact, Linux's implementation is
> completely compliant and FreeBSD allows +20 which is beyond the POSIX range.
> Also, -1 return value from getpriority() is a problematic point of POSIX
> specification not implemenations.

To conform, FreeBSD would need to expand or shrink the priority range by
1 to cover or drop +20, and change NZERO from 0 to 20 or 21, and move the
priorities in the grey box up by NZERO.

> Regarding #2, both FreeBSD and Linux in their unique ways correctly
> return errno/priority level from kernel-land to user-land. FreeBSD
> syscall returns priority already in [-{NZERO},{NZERO} -1] range; Linux

Except NZERO is 0 in FreeBSD.

> syscall returns priority in [1,2*{NZERO}] range and with reversed
> comparison, and then (g)libc stub of getpritority performs 20-X
> conversion to return a correct value to application.

>> I think the reason that setpriority(2) is not affected is actually that
>> Linux applications know to use (20 - pri) to recover the actual priority.

It is actually the library stub that does this.  So getpriority(2) doesn't
give POSIX getpriority in Linux, but getpriority() 3 does.

>> Fixing getpriority() in FreeBSD and all emulators should involve much the
>> same code: map the range of internal priorities [PRIO_MIN, PRIO_MAX] to
>> getpriority()'s range [0, 2*{SUBSYSTEM_SPECIFIC_NZERO}-1] as linearly
>> as possible (something like:
>>
>>     pri |-> (pri - PRIO_MIN) * (2 * SUBSYSTEM_SPECIFIC_NZERO - 1) /
>>         (PRIO_MAX - PRIO_MIN)
>>
>> but more complicated, since for if SUBSYSTEM_SPECIFIC_NZERO == 20 the
>> above maps the default priority 0 to (20 * 39 / 2) = 19, but 20 is
>> required; also for Linux there must be a negation.
>
> I think you have greatly overcomplicated thing sbecause of your original
> misunderstanding. Just compile a small program using
> getpriority/setpriority for FreeBSD, Linux and any other Unix avaialble
> to you, run it and you will see how simple thingx are in reality and
> that NZERO is not visible to userland. Read the man pages too.
> Yes, and try Linux emulation with and without my patch to understand
> what the problem with emualtion really is.

This part of my previous mail is almost correct.  There is an internal
range [PRIO_MIN, PRIO_MAX] which should be mapped to the [-{NZERO},
{NZERO} -1] range (not the [0, 2*{NZERO} - 1] range like I said
previously.  setpriority() should invert this mapping.  Matching the
range of the emulated system is actually more important for setpriority(),
since applications probably treat values returned by getpriority() as
cookies and don't notice if they are out of bounds, but the kernel
does range checking on the values passed by setpriority().  In addition,
for Linux getpriority() the values must be mapped by pri |-> 20 - pri
so that the library stub can restore the previous values.  The magic
20 is spelled 20 in the Linux kernel (2.6.10 at least) and as PZERO
in glibc (2.3.2 at least).  This secondary mapping makes scaling in the
first mapping more important, since if FreeBSD had +21 in its priority
range, then 20 - pri would give a value of -1 and the library stub would
conider this to be an error.

Summary: I don't like the committed version since it has many subtle
magic numbers in its 20 - X formula:
20: part of Linux adjustment.  20 = 1 + Linux's maximum priority.
-1: another part of Linux adjustment
1: factor of 20/20 for the scaling step, where the first 20 is what should
     be Linux's NZERO and the second 20 is what should be FreeBSD's NZERO
     (= (PRIO_MAX - PRIO_MIN) / 2).  Note that these 20's are subtly
     different from the 20 in Linux's adjustment.
0: bias for the scaling step (= FreeBSD NZERO).

Bruce


More information about the freebsd-emulation mailing list