kern/189941: getgroups(2) implements first argument as unsigned int

Tue May 20 08:20:01 UTC 2014

The following reply was made to PR kern/189941; it has been noted by GNATS.

From: Bruce Evans <brde at optusnet.com.au>
To: Peter Holm <pho at freebsd.org>
Cc: freebsd-gnats-submit at freebsd.org, freebsd-bugs at freebsd.org
Subject: Re: kern/189941: getgroups(2) implements first argument as unsigned
 int
Date: Tue, 20 May 2014 18:14:43 +1000 (EST)

 On Mon, 19 May 2014, Peter Holm wrote:

 >> Description:
 > Passing -1 as gidsetlen is not detected. Discovered by ATF. Caught on Ubuntu and OS/X.

 A typical error from abusing an unsigned variable as a counter.  This doesn't
 even match the API.  This bug was in 4.4BSD.

 The fix seems to be almost as simple as changing the u_int for getgroups()
 in syscalls.master to int.  Also change 3 corresponding u_int's in
 kern_prot.c

 setgroups() has the same API design error, but there it reduces to an
 obfuscated way to check for invalid counts (for getgroups(), negative
 counts are automatically detected as invalid, since they are too small
 to hold any actual number of groups).  For setgroups(), negative counts
 are are type-punned to become large unsigned counts, so they are
 obfuscatedly detected as invalid because they are larger than any actual
 number for {NGROUPS_MAX}.  The check for that should for negative counts.

 Grepping for u_ in syscalls.master shows abuse of u_int for almost all
 uses:
 - dup, dup2: not even wrong, modulo type puns being benign.  These syscalls
     take int args, but syscalls. master says that they take u_int args.
     This converts the ints to u_ints using a type pun.  But the kernel
     converts these u_ints back to ints at the top level.  Obfuscated
     tests for negative values for file descriptors using the unsigned
     hack have (all?) been fixed in at least kern_descrip.c too.
 - profil: correct
 - getlogin: POSIX specifies size_t for getlogin_r(3), but FreeBSD still
     uses int.  The getlogin syscall is not directly available, but is
     closer to getlogin_r(3) than getlogin(3).  It uses u_int where the
     FreeBSD API says int.  This sort of works.  It is an obfuscated way
     of giving the POSIX semantics instead of the documented semantics.
     An arg of -1 doesn't mean -1, but means SIZE_MAX in POSIX and UINT_MAX
     in FreeBSD.  This is a physically impossible size (except on 64-bit
     systems while the punned FreeBSD semantics), but the POSIX semantics
     don't allow detecting it as an error, and the FreeBSD behaviour is to
     reduce it to MAXLOGNAME.  The behaviour is undefined in most cases if
     the arg is -1, but in practice the syscall will do the right thing if
     the buffer has size >= MAXLOGNAME.
 - getgroups, setgroups: see above
 - getitimer, setitimer: like setgroups (just the unsigned hack, but the
     range checking is slightly more obfuscated: the valid values are 0,
     1 and 2, so checking for the unsigned value being <= {the one that
     happens to be 2} gives an obfuscated quick check for the value being
     one of these three.
 - gethostname: POSIX specifies size_t.  FreeBSD documents size_t, but
     actually uses u_int.  On 64-bit arches, the buffer may have the silly
     but valid size of 2**32.  This is blindly truncated to 0 so the syscall
     fails.  The arg is converted back to size_t at the top level in the
     kernel so as to pass the address of a size_t to another function,
     but it has already been truncated then.  Similar truncation breaks the
     accidental change to POSIX semantics in some cases in getlogin_r.
 - sethostname: not in POSIX.  FreeBSD still documents int, but still
     actually uses u_int.  The arg is converted back to size_t too late to
     preserve it, as for gethostname.
 - readv, writev: POSIX and FreeBSD document that the iovcnt arg is int.
     FreeBSD type puns it to u_int, then converts this to size_t before
     using it as an arg for copyin*().  No errror checking is done except
     in copyin*() where the error checking is adequate.  -1 becomes UINT_MAX
     vua the type pun, but is not increased further to SIZE_MAX on 64-bit
     arches.  U_INT is usually too large, so the result is probably EFAULT.
 - getrlimit, setrlimit.  Like getitimer and setitimer, except the there
     is a macro to de-obfuscate the upper limit.  In the same file, the
     'which' variable for get/setpriority is handled quite differently.
     It is left as an int and checked using a case statement.
 - getdirentries: just the unsigned hack
 - __sysctl: at least matches the documented API.  The API uses u_int for
     small counts and size_t for large counts.  Old APIs don't have the
     design error of using u_int where int would do.  New APIs mostly
     use size_t excessively.  My grep doesn't find these.  It finds this
     one since it has a strange mixture of u_int and size_t.
 - poll: correct, I think.  POSIX specifies nfds_t, and IIRC has the design
     errors of requiring this to be unsigned and having excessive typedefs
     for the function.  FreeBSD spells it u_int to avoid some namespace
     pollution.  It should spell it nfds_t except in syscalls.master and
     files generated from it.
 - preadv, pwritev: like readv, writev (?)
 - __getcwd: POSIX specifies size_t and FreeBSD documents size_t for getcwd().
     Using u_int probably gives the usual truncation bugs.
 - *audit*: not checked.  The only set of new APIs that uses u_int.  About
     25 years anachronistic with C90's and POSIX's converting even old APIs
     to use size_t.
 - _umtx_op: not checked
 - *cap*: not checked.  Mostly not count args or plain u_int.  It mispells
     uint64_t as u_int64_t.  The correct spelling uint* is used for just 1
     syscall, and my grep for u_ didn't find it.  This use seems to be to
     avoid namespace pollution, so it is correct.  This points to further
     uses of unsigned types which were hidden by althernative spellings.
     The style bug of using 'unsigned' is used a bit:
     - nmount: uses 'unsigned int'.  Otherwise like writev (?), except the
       documented API also has this bug.
     - *ksem*, *kmq*: uses 'unsigned int'.  Not checked.  Seems to be
       undocumented.
     - jail_get, jail_set: like nmount.

 Stress and regression tests could try to find bugs in all of the above,
 but I code inspection only found the harmless truncation bugs for u_int
 instead of size_t in addition to the one in the PR.  The one in the
 PR is a rare case where the unsigned comparison hack doesn't give
 fail-safe behaviour.

 Bruce