[RFC] ASLR Whitepaper and Candidate Final Patch

Fri Jul 25 07:17:47 UTC 2014

On 24 Jul 2014, at 18:57, Shawn Webb <lattera at gmail.com> wrote:

>>> I think someone has already commented that Peter Holm's help might be 
>>> enlisted; you have have seen his 'stress2' suite, which could help with 
>>> stability testing.
>> 
>> I'll take a look at that, too. Thanks a lot for your suggestions and
>> feedback.
> 
> The unixbench results are in. The overall scores are below.
> 
> ASLR Disabled: 456.33
> ASLR Enabled:  357.05
> No ASLR:       474.03
> 
> I've uploaded the raw results to
> http://0xfeedface.org/~shawn/aslr/2014-07-24_benchmark.tar.gz
> 
> Take these results with a grain of salt, given that some of unixbench's
> test are filesystem-related and I'm running ZFS on an old laptop with
> little RAM. It does show that there is a performance impact when ASLR is
> enabled.

Just in case you've not spotted it, there's some useful benchmarking advice here:

	https://wiki.freebsd.org/BenchmarkAdvice

Unfortunately, the numbers above are a bit opaque, as it's not clear whether the differences/non-differences are statistically significant. Likewise, we'd expect that ASLR might impact some types of behaviour more than others, and so reduction to a single number can overlook problems or overemphasise differences. For now, the key thing is really that there not be any measurable performance difference when ASLR is disabled, and the numbers above make it a bit unclear if that is the case. The numbers are definitely difference above -- but perhaps this is a result of non-essential code generation differences, noise in the run, etc. Typically, you would want to use a technique such as a t-test to compare runs and decide if the difference is significant. Tools such as ministat are very useful here, although you have to be a bit careful as most performance measurements are already arithmetic means due to the need to run individual instances of the operation of interest many times, and comparing means of means is a messy business.

The next direction will be to dig more into areas where there are statistically significant changes to decide whether they are caused by ASLR, or perhaps are just non-essential differences in code generation. It may be useful to consider using a suite like 'libmicro' that can drill into individual system-call behaviour more, as well as larger-scale benchmarks that consider the behaviour of applications with realisticish workloads -- Postgres has been of particular interest lately.

Robert