locks and kernel randomness...
Harrison Grundy
harrison.grundy at astrodoggroup.com
Wed Feb 25 09:11:00 UTC 2015
On 02/25/15 00:56, Konstantin Belousov wrote:
> On Wed, Feb 25, 2015 at 12:29:29AM -0800, Harrison Grundy wrote:
>>
>>
>> On 02/24/15 23:58, Alfred Perlstein wrote:
>>>
>>> On 2/24/15 7:23 PM, John-Mark Gurney wrote:
>>>> K. Macy wrote this message on Tue, Feb 24, 2015 at 15:33
>>>> -0800:
>>>>>> If someone does find a performance issue w/ my patch, I
>>>>>> WILL work with them on a solution, but I will not work w/
>>>>>> people who make unfounded claims about the impact of this
>>>>>> work...
>>>>>>
>>>>> <shrug> ... The concerns may be exaggerated, but they
>>>>> aren't unfounded. Not quite the same thing, but no one
>>>>> wants to spend the
>>>> Till someone shows me code in the kernel tree where this is
>>>> even close to a performance problem, it is unfounded... I've
>>>> asked, and no one has
>>>>
>>>>> cycles doing a SHA256 because it's "The Right Thing"(tm)
>>>>> when their use case only requires a fletcher2.
>>>> Depends upon what you're doing.. I haven't proposed changing
>>>> ZFS's default to sha256, so stop w/ the false
>>>> equivalences...
>>>>
>>>>> If it doesn't already exist, it might also be worth looking
>>>>> in to a more scalable CSPRNG implementation not requiring
>>>>> locking in the common case. For example, each core is
>>>>> seeded separately periodically so that has a private pool
>>>>> that is protected by a critical section. The private pool
>>>>> would be regularly refreshed by cpu-local callout. Thus, a
>>>>> lock would only be acquired if the local entropy were
>>>>> depleted.
>>>> I'm not discussing this until you read and reply to my
>>>> original email, since it's clear that my original email's
>>>> contents has been ignored in this thread...
>>>>
>>> What is final proposal? More spinlocks? That is not a good
>>> idea.
>>>
>>> Doing a single buildworld is not enough. Ask netflix or
>>> someone with a real load of 1000s of threads/processing to do
>>> testing for you if you truly want to touch scheduler.
>>
>> sched_ule runs this code once every .5 to 1.5 seconds, depending
>> on the value of random, so using a CSPRNG there wouldn't actually
>> be noticeable. (We're talking about a few thousand cycles, when
>> the existing implementation has to make a remote memory
>> read/write numpackages-1/numpackages percent of the time, which
>> costs tens of thousands of cycles. Switching to a per-CPU CSPRNG
>> is actually faster in those cases.)
> The cost of the proposed patch, of course, is not the several
> thousands of instructions in the rebalance. The problem with it is
> the introduction of the new spinlock, which will be used in many
> places after the introduction. The cost of the new and often used
> spinlock is the increase of both interrupt latency and interrupt
> handler jitter and cpu switch jitter.
>
> So neither buildworld timing, nor network throughput are adequate
> to estimate the change. It is system unresponsivness and loss of
> the realtime behaviour up to some degree.
>
> I thought that it was obvious, at least after spinlocks were
> mentioned, but apparently it is not, since proposals to measure the
> patch effect by benchmarking buildworld or passing the traffic are
> made.
>
I was referring to the sched_ule case only. That analysis will
certainly need to be done for each use of random(), though.
>>
>> That being said, I believe the plan is to remove random() from
>> sched_ule entirely. It doesn't need it to perform the balancing,
>> and we can just use the LCG from cpu_search, if get_cyclecount
>> isn't viable.
>
> I agree with this, but please see my other response.
More information about the freebsd-arch
mailing list