svn commit: r280407 - head/sys/kern

Wed Mar 25 20:15:30 UTC 2015

On Tue, Mar 24, 2015 at 03:58:14PM +1100, Bruce Evans wrote:
> On Tue, 24 Mar 2015, Mateusz Guzik wrote:
> 
> >Log:
> > filedesc: microoptimize fget_unlocked by getting rid of fd < 0 branch
> 
> This has no effect.  Compilers optimize to the equivalent of the the
> unsigned cast hack if this is good.  On x86, it is good since no
> instructions are needed for the conversion, and the only difference
> between the code generated by
> 
> 	if (fd >= fdt->fdt_nfiles)
> 
> and
> 
> 	if (fd < 0 || fd >= fdt->fdt_nfiles)
> 
> is to change from jl (jump if less than) to jb (jump if below) (both
> jumps over the if clause).  Negative values satisfy jl but not jb.
> 

I would not commit the change if it did not affect generated assembly at
least on amd64.

if (fd < 0 || fd >= fdt->fdt_nfiles):
0xffffffff807d147d <fget_unlocked+13>:	sub    $0x38,%rsp
0xffffffff807d1481 <fget_unlocked+17>:	test   %esi,%esi
0xffffffff807d1483 <fget_unlocked+19>:	js     0xffffffff807d15b8 <fget_unlocked+328>
0xffffffff807d1489 <fget_unlocked+25>:	mov    (%rdi),%rbx
0xffffffff807d148c <fget_unlocked+28>:	cmp    %esi,(%rbx)
0xffffffff807d148e <fget_unlocked+30>:	jle    0xffffffff807d15bf <fget_unlocked+335>

if ((u_int)fd >= fdt->fdt_nfiles):
0xffffffff807d147d <fget_unlocked+13>:	sub    $0x38,%rsp
0xffffffff807d1481 <fget_unlocked+17>:	mov    (%rdi),%rbx
0xffffffff807d1484 <fget_unlocked+20>:	cmp    %esi,(%rbx)
0xffffffff807d1486 <fget_unlocked+22>:	jbe    0xffffffff807d15a8 <fget_unlocked+312>

I did not check other archs prior to the commit.

This is clang 3.6 as present in head. Sources compiled with -O2. Also
see below for other compiler test.

> > Casting fd to an unsigned type simplifies fd range coparison to mere checking
> > if the result is bigger than the table.
> 
> No, it obfuscates the range comparison.
> 

It is a standard hack which is hard to misread and which seems to add a
slight benefit (see below).

> On some arches, conversion to unsigned is slow.  Then compilers should
> optimize in the opposite direction by first undoing the bogus cast to
> get back to the range check and then optimizing the range check using
> the best strategy.  Compilers should probably first undo the bogus
> cast even on x86, so as to reduce to the range check case.  Range checks
> are more important and more uniform than bogus casts, so they are more
> likely to be optimized.  Similarly if fd has type u_int to begin with.

It affects assembly on all arm, powerpc64 and mips64 as well. Both arm
and powerpc just get rid of zero-test and use the same instructions to
perform the other comparison. I only found a difference on mips64 which
used sltu instead of slt (but still got rid of the zero-check).

Granted I don't know mips instruction costs and I don't have the real
hadrware to benchark on.

Seems like a win for most architectures anyway.

> 
> >Modified: head/sys/kern/kern_descrip.c
> >==============================================================================
> >--- head/sys/kern/kern_descrip.c	Tue Mar 24 00:01:30 2015	(r280406)
> >+++ head/sys/kern/kern_descrip.c	Tue Mar 24 00:10:11 2015	(r280407)
> >@@ -2342,7 +2342,7 @@ fget_unlocked(struct filedesc *fdp, int
> >#endif
> >
> >	fdt = fdp->fd_files;
> >-	if (fd < 0 || fd >= fdt->fdt_nfiles)
> >+	if ((u_int)fd >= fdt->fdt_nfiles)
> >		return (EBADF);
> >	/*
> >	 * Fetch the descriptor locklessly.  We avoid fdrop() races by
> 
[..]
> - fget_locked() seems to be unchanged recently, so it doesn't have the
>   obfuscation.  fget_unlocked() is newer than the unobfuscated version
>   of fget_locked().  It is in a different file, but may have copied the
>   unobfuscated range check from fget_locked().  This commit restores the
>   obfuscation to it alone.
> 

This definitely should be synced one way or the other, thanks for
pointing this out.

> There are now some other range checks in kern_desc.c that are missing
> the obfuscation: grepping for "fd <" gives:
> - a magic treatment for negative fd's in closefrom()

Well I'll start with a short note that I don't know what's up with
uap->lowfd = 0; instead of retuning with EINVAL.

Anyay, the cast there would not have any use.

> - the range check in an unobfuscated by confusing form in fdisused().
>   The check has to be inverted since it is in a KASSERT(), and the
>   inversion is done by reversing the inequalities.

Correct, this likely should also be synced (one way or the other).

> - similarly in fdalloc().
> 

Same.

> I used to use this hack a lot 30 years ago, but stopped when compilers
> got better 20-25 years ago.
> 

I wrote a toy program and checked e.g. gcc5 and it still did not
optimise zero-test away.

In fact I would argue the optimisation in question is impossible unless
upper limit check is against a constant in (0, INT_MAX) range or against
a var whose range is known at compile time.

In particular this is problematic for negative values.

Consider:
int fd, limit;
............
if (fd < 0 || fd >= limit)

Let's have fd = -5 and limit = -4.

Should the fd < 0 check be dropped by the compiler and the expression
turned into (u_int)fd >= limit, the coparison would be false thus
changing the outcome.

As such, I prefer to keep the cast.

-- 
Mateusz Guzik <mjguzik gmail.com>