[RFC] ASLR Whitepaper and Candidate Final Patch

Fri Jul 25 15:38:04 UTC 2014

On Fri, Jul 25, 2014 at 08:17:36AM +0100, Robert N. M. Watson wrote:
> 
> On 24 Jul 2014, at 18:57, Shawn Webb <lattera at gmail.com> wrote:
> 
> >>> I think someone has already commented that Peter Holm's help might be 
> >>> enlisted; you have have seen his 'stress2' suite, which could help with 
> >>> stability testing.
> >> 
> >> I'll take a look at that, too. Thanks a lot for your suggestions and
> >> feedback.
> > 
> > The unixbench results are in. The overall scores are below.
> > 
> > ASLR Disabled: 456.33
> > ASLR Enabled:  357.05
> > No ASLR:       474.03
> > 
> > I've uploaded the raw results to
> > http://0xfeedface.org/~shawn/aslr/2014-07-24_benchmark.tar.gz
> > 
> > Take these results with a grain of salt, given that some of unixbench's
> > test are filesystem-related and I'm running ZFS on an old laptop with
> > little RAM. It does show that there is a performance impact when ASLR is
> > enabled.
> 
> Just in case you've not spotted it, there's some useful benchmarking
> advice here:
>
>  https://wiki.freebsd.org/BenchmarkAdvice
>
> Unfortunately, the numbers above are a bit opaque, as it's not clear
> whether the differences/non-differences are statistically significant.
> Likewise, we'd expect that ASLR might impact some types of behaviour
> more than others, and so reduction to a single number can overlook
> problems or overemphasise differences. For now, the key thing is
> really that there not be any measurable performance difference when
> ASLR is disabled, and the numbers above make it a bit unclear if
> that is the case. The numbers are definitely difference above -- but
> perhaps this is a result of non-essential code generation differences,
> noise in the run, etc. Typically, you would want to use a technique
> such as a t-test to compare runs and decide if the difference is
> significant. Tools such as ministat are very useful here, although you
> have to be a bit careful as most performance measurements are already
> arithmetic means due to the need to run individual instances of the
> operation of interest many times, and comp aring means of means is a
> messy business.
>
> The next direction will be to dig more into areas where there are
> statistically significant changes to decide whether they are caused
> by ASLR, or perhaps are just non-essential differences in code
> generation. It may be useful to consider using a suite like 'libmicro'
> that can drill into individual system-call behaviour more, as well as
> larger-scale benchmarks that consider the behaviour of applications
> with realisticish workloads -- Postgres has been of particular
> interest lately.

The unixbench includes the execve(2) speed test, AFAIR.  It is probably
the only relevant test from the whole suite.

Benchmarking the proposed bug is hard, because it affects mostly
the loads which are typically excluded from the normal benchmark's
measurement phases. The setup stage, where the image is activated and
faulting of the pages needed for the later steady stages is the most
vulnerable. The benchmarks itself would typically execute in the mode
where only larger page tables, and correlated higher TLB miss frequency
could be observed. I expect that the later is hidden by L2 cache if the
test working set fits into L2.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-arch/attachments/20140725/b47188b6/attachment.sig>