svn commit: r280407 - head/sys/kern
Mateusz Guzik
mjguzik at gmail.com
Wed Mar 25 20:15:30 UTC 2015
On Tue, Mar 24, 2015 at 03:58:14PM +1100, Bruce Evans wrote:
> On Tue, 24 Mar 2015, Mateusz Guzik wrote:
>
> >Log:
> > filedesc: microoptimize fget_unlocked by getting rid of fd < 0 branch
>
> This has no effect. Compilers optimize to the equivalent of the the
> unsigned cast hack if this is good. On x86, it is good since no
> instructions are needed for the conversion, and the only difference
> between the code generated by
>
> if (fd >= fdt->fdt_nfiles)
>
> and
>
> if (fd < 0 || fd >= fdt->fdt_nfiles)
>
> is to change from jl (jump if less than) to jb (jump if below) (both
> jumps over the if clause). Negative values satisfy jl but not jb.
>
I would not commit the change if it did not affect generated assembly at
least on amd64.
if (fd < 0 || fd >= fdt->fdt_nfiles):
0xffffffff807d147d <fget_unlocked+13>: sub $0x38,%rsp
0xffffffff807d1481 <fget_unlocked+17>: test %esi,%esi
0xffffffff807d1483 <fget_unlocked+19>: js 0xffffffff807d15b8 <fget_unlocked+328>
0xffffffff807d1489 <fget_unlocked+25>: mov (%rdi),%rbx
0xffffffff807d148c <fget_unlocked+28>: cmp %esi,(%rbx)
0xffffffff807d148e <fget_unlocked+30>: jle 0xffffffff807d15bf <fget_unlocked+335>
if ((u_int)fd >= fdt->fdt_nfiles):
0xffffffff807d147d <fget_unlocked+13>: sub $0x38,%rsp
0xffffffff807d1481 <fget_unlocked+17>: mov (%rdi),%rbx
0xffffffff807d1484 <fget_unlocked+20>: cmp %esi,(%rbx)
0xffffffff807d1486 <fget_unlocked+22>: jbe 0xffffffff807d15a8 <fget_unlocked+312>
I did not check other archs prior to the commit.
This is clang 3.6 as present in head. Sources compiled with -O2. Also
see below for other compiler test.
> > Casting fd to an unsigned type simplifies fd range coparison to mere checking
> > if the result is bigger than the table.
>
> No, it obfuscates the range comparison.
>
It is a standard hack which is hard to misread and which seems to add a
slight benefit (see below).
> On some arches, conversion to unsigned is slow. Then compilers should
> optimize in the opposite direction by first undoing the bogus cast to
> get back to the range check and then optimizing the range check using
> the best strategy. Compilers should probably first undo the bogus
> cast even on x86, so as to reduce to the range check case. Range checks
> are more important and more uniform than bogus casts, so they are more
> likely to be optimized. Similarly if fd has type u_int to begin with.
It affects assembly on all arm, powerpc64 and mips64 as well. Both arm
and powerpc just get rid of zero-test and use the same instructions to
perform the other comparison. I only found a difference on mips64 which
used sltu instead of slt (but still got rid of the zero-check).
Granted I don't know mips instruction costs and I don't have the real
hadrware to benchark on.
Seems like a win for most architectures anyway.
>
> >Modified: head/sys/kern/kern_descrip.c
> >==============================================================================
> >--- head/sys/kern/kern_descrip.c Tue Mar 24 00:01:30 2015 (r280406)
> >+++ head/sys/kern/kern_descrip.c Tue Mar 24 00:10:11 2015 (r280407)
> >@@ -2342,7 +2342,7 @@ fget_unlocked(struct filedesc *fdp, int
> >#endif
> >
> > fdt = fdp->fd_files;
> >- if (fd < 0 || fd >= fdt->fdt_nfiles)
> >+ if ((u_int)fd >= fdt->fdt_nfiles)
> > return (EBADF);
> > /*
> > * Fetch the descriptor locklessly. We avoid fdrop() races by
>
[..]
> - fget_locked() seems to be unchanged recently, so it doesn't have the
> obfuscation. fget_unlocked() is newer than the unobfuscated version
> of fget_locked(). It is in a different file, but may have copied the
> unobfuscated range check from fget_locked(). This commit restores the
> obfuscation to it alone.
>
This definitely should be synced one way or the other, thanks for
pointing this out.
> There are now some other range checks in kern_desc.c that are missing
> the obfuscation: grepping for "fd <" gives:
> - a magic treatment for negative fd's in closefrom()
Well I'll start with a short note that I don't know what's up with
uap->lowfd = 0; instead of retuning with EINVAL.
Anyay, the cast there would not have any use.
> - the range check in an unobfuscated by confusing form in fdisused().
> The check has to be inverted since it is in a KASSERT(), and the
> inversion is done by reversing the inequalities.
Correct, this likely should also be synced (one way or the other).
> - similarly in fdalloc().
>
Same.
> I used to use this hack a lot 30 years ago, but stopped when compilers
> got better 20-25 years ago.
>
I wrote a toy program and checked e.g. gcc5 and it still did not
optimise zero-test away.
In fact I would argue the optimisation in question is impossible unless
upper limit check is against a constant in (0, INT_MAX) range or against
a var whose range is known at compile time.
In particular this is problematic for negative values.
Consider:
int fd, limit;
............
if (fd < 0 || fd >= limit)
Let's have fd = -5 and limit = -4.
Should the fd < 0 check be dropped by the compiler and the expression
turned into (u_int)fd >= limit, the coparison would be false thus
changing the outcome.
As such, I prefer to keep the cast.
--
Mateusz Guzik <mjguzik gmail.com>
More information about the svn-src-all
mailing list