Memory allocation performance

Alexander Motin mav at FreeBSD.org
Sat Feb 2 14:05:44 PST 2008


Robert Watson wrote:
> Hence my request for drilling down a bit on profiling -- the question 
> I'm asking is whether profiling shows things running or taking time that 
> shouldn't be.

I have not yet understood why does it happend, but hwpmc shows huge 
amount of "p4-resource-stall"s in UMA functions:
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  45.2    2303.00  2303.00        0  100.00%           uma_zfree_arg [1]
  41.2    4402.00  2099.00        0  100.00%           uma_zalloc_arg [2]
   1.4    4472.00    70.00        0  100.00% 
uma_zone_exhausted_nolock [3]
   0.9    4520.00    48.00        0  100.00%           ng_snd_item [4]
   0.8    4562.00    42.00        0  100.00%           __qdivrem [5]
   0.8    4603.00    41.00        0  100.00%           ether_input [6]
   0.6    4633.00    30.00        0  100.00%           ng_ppp_prepend [7]

Probably it explains why "p4-global-power-events" shows many hits into them
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  20.0   37984.00 37984.00        0  100.00%           uma_zfree_arg [1]
  17.8   71818.00 33834.00        0  100.00%           uma_zalloc_arg [2]
   4.0   79483.00  7665.00        0  100.00%           ng_snd_item [3]
   3.0   85256.00  5773.00        0  100.00%           __mcount [4]
   2.3   89677.00  4421.00        0  100.00%           bcmp [5]
   2.2   93853.00  4176.00        0  100.00%           generic_bcopy [6]

, while "p4-instr-retired" does not.
   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  11.1    5351.00  5351.00        0  100.00%           ng_apply_item [1]
   7.9    9178.00  3827.00        0  100.00% 
legacy_pcib_alloc_msi [2]
   4.1   11182.00  2004.00        0  100.00%           init386 [3]
   4.0   13108.00  1926.00        0  100.00%           rn_match [4]
   3.5   14811.00  1703.00        0  100.00%           uma_zalloc_arg [5]
   2.6   16046.00  1235.00        0  100.00%           SHA256_Transform [6]
   2.2   17130.00  1084.00        0  100.00%           ng_add_hook [7]
   2.0   18111.00   981.00        0  100.00%           ng_rmhook_self [8]
   2.0   19054.00   943.00        0  100.00%           em_encap [9]

For this moment I have invent two possible explanation. One is that due 
to UMA's cyclic block allocation order it does not fits CPU caches and 
another that it is somehow related to critical_exit(), which possibly 
can cause context switch. Does anybody have better explanation how such 
small and simple in this part function can cause such results?

-- 
Alexander Motin


More information about the freebsd-performance mailing list