Memory allocation performance

Max Laier max at love2party.net
Sat Feb 2 15:07:29 PST 2008


Am Sa, 2.02.2008, 23:05, schrieb Alexander Motin:
> Robert Watson wrote:
>> Hence my request for drilling down a bit on profiling -- the question
>> I'm asking is whether profiling shows things running or taking time that
>> shouldn't be.
>
> I have not yet understood why does it happend, but hwpmc shows huge
> amount of "p4-resource-stall"s in UMA functions:
>    %   cumulative   self              self     total
>   time   seconds   seconds    calls  ms/call  ms/call  name
>   45.2    2303.00  2303.00        0  100.00%           uma_zfree_arg [1]
>   41.2    4402.00  2099.00        0  100.00%           uma_zalloc_arg [2]
>    1.4    4472.00    70.00        0  100.00%
> uma_zone_exhausted_nolock [3]
>    0.9    4520.00    48.00        0  100.00%           ng_snd_item [4]
>    0.8    4562.00    42.00        0  100.00%           __qdivrem [5]
>    0.8    4603.00    41.00        0  100.00%           ether_input [6]
>    0.6    4633.00    30.00        0  100.00%           ng_ppp_prepend [7]
>
> Probably it explains why "p4-global-power-events" shows many hits into
> them
>    %   cumulative   self              self     total
>   time   seconds   seconds    calls  ms/call  ms/call  name
>   20.0   37984.00 37984.00        0  100.00%           uma_zfree_arg [1]
>   17.8   71818.00 33834.00        0  100.00%           uma_zalloc_arg [2]
>    4.0   79483.00  7665.00        0  100.00%           ng_snd_item [3]
>    3.0   85256.00  5773.00        0  100.00%           __mcount [4]
>    2.3   89677.00  4421.00        0  100.00%           bcmp [5]
>    2.2   93853.00  4176.00        0  100.00%           generic_bcopy [6]
>
> , while "p4-instr-retired" does not.
>    %   cumulative   self              self     total
>   time   seconds   seconds    calls  ms/call  ms/call  name
>   11.1    5351.00  5351.00        0  100.00%           ng_apply_item [1]
>    7.9    9178.00  3827.00        0  100.00%
> legacy_pcib_alloc_msi [2]
>    4.1   11182.00  2004.00        0  100.00%           init386 [3]
>    4.0   13108.00  1926.00        0  100.00%           rn_match [4]
>    3.5   14811.00  1703.00        0  100.00%           uma_zalloc_arg [5]
>    2.6   16046.00  1235.00        0  100.00%           SHA256_Transform
> [6]
>    2.2   17130.00  1084.00        0  100.00%           ng_add_hook [7]
>    2.0   18111.00   981.00        0  100.00%           ng_rmhook_self [8]
>    2.0   19054.00   943.00        0  100.00%           em_encap [9]
>
> For this moment I have invent two possible explanation. One is that due
> to UMA's cyclic block allocation order it does not fits CPU caches and
> another that it is somehow related to critical_exit(), which possibly
> can cause context switch. Does anybody have better explanation how such
> small and simple in this part function can cause such results?

I didn't see bzero accounted for in any of the traces in this thread -
makes me wonder if that might mean that it's counted within uma_zalloc? 
Maybe we are calling it twice by accident?  I wasn't quite able to figure
out the logic of M_ZERO vs. UMA_ZONE_MALLOC etc. ... just a crazy idea.

-- 
/"\  Best regards,                      | mlaier at freebsd.org
\ /  Max Laier                          | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | mlaier at EFnet
/ \  ASCII Ribbon Campaign              | Against HTML Mail and News


More information about the freebsd-performance mailing list