svn commit: r284959 - in head: . share/man/man4 share/man/man9 sys/conf sys/dev/glxsb sys/dev/hifn sys/dev/random sys/dev/rndtest sys/dev/safe sys/dev/syscons sys/dev/ubsec sys/dev/virtio/random sy...

Thu Jul 23 13:51:54 UTC 2015

> On Jul 23, 2015, at 1:03 AM, Mark R V Murray <markm at FreeBSD.org> wrote:
> 
> 
>> On 23 Jul 2015, at 00:53, Warner Losh <imp at bsdimp.com> wrote:
>> 
>>>>> Neither filesystem operations nor allocations are random events.  They are trivially influenced by user code.  A malicious attacker could create repeated patterns of allocations or filesystem activity through the syscall path to degrade your random sample source.
>>>> 
>>>> I?m not sure I accept that - Fortuna is very careful about using non-reversible hashing in it?s accumulation, and countering such degradation is one of the algorithm?s strong points. There is perhaps risk of *no* entropy, but even the per-event timing jitter will be providing this, if nothing else.
>> 
>> I’m not sure I’m happy about this answer. Do you have some research backing up such cavalier claims?
> 
> It was not my intention to sound cavalier. Apologies.
> 
> Fortuna was developed to account for many sources of entropy, good and bad alike, and Jeff’s observation is an attack on that design. I accept that the randomness of these events is poor, but they are high-rate, and this product of high-rate*low entropy is what I seek. I pulled out numbers with dtrace, and basic statistics showed that the harvesting was not useless. I completely understand that under the right circumstances these numbers might be lousy - please read the Fortuna design document to understand why this doesn’t matter. *ALL* entropy inputs to Fortuna are considered attackable, including the dedicated hardware sources.
> 
> I have also read cryptanalyses of Fortuna, not all of them to be sure, and so far the design appears strong. The best attack that I have seen (very academic) suggests an improvement which I may incorporate.
> 
>>>>> Perhaps more importantly to me, this is an unacceptable performance burden for the allocator.  At a minimum it should compile out by default. Great care has been taken to reduce the fast path of the allocator to the minimum number of cycles and even cache misses.
>>>> 
>>>> As currently set up in etc/rc.d/* by default, there is a simple check at each UMA harvesting opportunity, and no further action. I asked Robert Watson if this was burdensome, and he said it was not.
>>> 
>>> I find this burdensome.  You can easily add a macro around the calls or hide them in an inline with a default to off.  Even a function call that checks a global and does nothing else is a handful of new cache misses.  A microbenchmark will not realize the full cost of this.  You will instead get the dozen or so instructions of overhead which I still find objectionable.
>>> 
>>> Kip's observations about packet cycle budgets in high-performance applications are accurate and this is something we have put great care into over time.
>> 
>> A certain video streaming company will be pushing the envelope to get to 100Gbps very soon. Even a few extra instructions on every packet / allocation will be a killer. Especially if one is an almost guaranteed cache miss. This most certainly will be burdensome. There absolutely must be a way to turn this off at compile time. We don’t care that much about entropy to leave performance on the table.
> 
> OK - I’m sold! I’ll make a kernel option defaulting to off. :-)
> 
> 

Hi Mark,

Thanks for making this concession.  I wanted to add a bit of historical perspective.  When Yarrow was introduced in the previous decade, it was initially wired into nearly all interrupt sources.  It turned out to be so expensive to those sources, especially for high-speed sources at the time like network and caching RAID drivers, that we then spent months disabling it from those sources.  In the end, a lot of code thrash happened and the effectiveness of Yarrow was questionable.

Fast forward to now with your recent work.  If UMA becomes expensive for high-speed use, everyone will go back to developing private per-driver and per-subsystem allocators to avoid it.  This will happen whether or not the UMA collector is controllable at run-time; if it’s enabled by default, benchmarks will be impacted and people will react.  That’ll be a huge step backwards for FreeBSD.

I also strongly agree with Jeff’s point on the questionable nature of this kind of fast-and-monotonic entropy collection, and Warner and Kip’s point on the finite number of clock cycles available for doing 100Gb networking.  If really high quality entropy is desired, won’t most serious people use a hardware source instead of a software source?  Not that I think that software entropy is useless, but it’s a question of how much effort and tradeoffs are put into it for what result.  An academically beautiful entropy system that hamstrings the OS from doing other essential things isn’t all that interesting, IMO.

Scott