How to build an executable with profiling?

Hans Ottevanger hansot at iae.nl
Sun Jan 23 15:15:42 UTC 2011


On 01/21/11 20:27, Roman Divacky wrote:
>>> This patch does three things:
>>>
>>> 1) emits "call .mcount" at the begining of every function body
>>>
>>
>> The differences on i386 between profiled and non-profiled code are not
>> as obvious as with gcc (using diff on assembly output), but on first
>> inspection it looks correct.
>
> cool :)
>
>>> 2) changes the driver to link in gcrt1.o instead of crt1.o
>>>
>>> 3) changes all -lfoo to -lfoo_p except when the foo ends with _s in
>>>     the linker invocation
>>>
>>
>> Maybe it is wise to follow the gcc implementation here.
>
> ok, makes sense
>
>>> I am not sure that I did the right thing, especially in (3). Anyway,
>>> the patch works for me (ie. produces a.out.gmon that seems to contain
>>> meaningful data).
>>>
>>> I would appreciate if you guys could test and review this. Letting me
>>> know if this is correct.
>>>
>>
>> On both my systems (i386 and amd64) something goes severely wrong when
>> linking several objects (all compiled with -pg, this is amd64):
>>
>> Perhaps the invocation of the linker still needs some work (or I must
>> redo my installation) but anyhow it looks like a good job. Thanks!
>
> I rewrote the libraries rewriting part to match gcc as close as possible.
> I also think that I solved your ld problem..
>
>
> please revert the old patch and test the new one:
>
>          http://lev.vlakno.cz/~rdivacky/clang-gprof.patch
>
> I believe this one is ok (works for me just fine), please test and report
> back so I can start integrating this upstream.
>

I performed a few quick tests on both i386 and amd64.

The problems I had with the invocation of ld appear to be solved. The 
behavior with respect to libraries is now identical to gcc as far I can see.

The results from gprof also look very promising. For my test program on 
amd64 the gprof output when using clang is

   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  42.5       4.22     4.22        0  100.00%           _mcount [5]
  22.0       6.41     2.18 14700000     0.00     0.00  f_timint [6]
  12.4       7.64     1.23 21900000     0.00     0.00  exp [10]
   8.4       8.48     0.84 22000000     0.00     0.00  vmol [9]
   5.4       9.02     0.54  6300000     0.00     0.00  f_angle [11]
   3.8       9.40     0.38        0  100.00%           .mcount (52)
   1.9       9.59     0.19  1000000     0.00     0.01  qk21 [4]
   1.9       9.78     0.19  1000000     0.00     0.00  pow [12]
   0.4       9.82     0.04   200000     0.00     0.03  qags [3]
   0.4       9.86     0.04   100000     0.00     0.00  zero [14]
   0.3       9.89     0.03   100000     0.00     0.00  qext [16]
   0.2       9.91     0.02   800000     0.00     0.00  f_apsis [15]
   0.1       9.91     0.01  2500000     0.00     0.00  fmax [17]
   0.1       9.92     0.01   100000     0.00     0.00  apsis [13]
   0.0       9.92     0.00  1000000     0.00     0.00  fmin [18]
   0.0       9.93     0.00   100000     0.00     0.03  timint [7]
   0.0       9.93     0.00   700000     0.00     0.00  tol_apsis [19]
   0.0       9.94     0.00   200000     0.00     0.00  sort [20]
   0.0       9.94     0.00        1     1.85  5334.52  main [1]
   0.0       9.94     0.00   100000     0.00     0.03  angle [8]
...

while using gcc yields

   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  44.3       4.23     4.23        0  100.00%           _mcount [5]
  18.5       6.00     1.76 14700000     0.00     0.00  f_timint [6]
  13.5       7.28     1.28 21900000     0.00     0.00  exp [10]
   9.0       8.14     0.86 22000000     0.00     0.00  vmol [9]
   5.5       8.66     0.52  6300000     0.00     0.00  f_angle [11]
   4.0       9.04     0.38        0  100.00%           .mcount (52)
   2.0       9.24     0.19  1000000     0.00     0.00  pow [12]
   2.0       9.43     0.19  1000000     0.00     0.00  qk21 [4]
   0.3       9.45     0.03   100000     0.00     0.00  zero [14]
   0.3       9.48     0.03   200000     0.00     0.02  qags [3]
   0.2       9.50     0.02   100000     0.00     0.00  qext [16]
   0.2       9.52     0.02   800000     0.00     0.00  f_apsis [15]
   0.1       9.53     0.00  2500000     0.00     0.00  fmax [17]
   0.0       9.53     0.00   700000     0.00     0.00  tol_apsis [18]
   0.0       9.53     0.00   200000     0.00     0.00  sort [19]
   0.0       9.54     0.00   100000     0.00     0.00  apsis [13]
   0.0       9.54     0.00        1     2.21  4927.66  main [1]
   0.0       9.54     0.00  1000000     0.00     0.00  fmin [20]
   0.0       9.54     0.00   100000     0.00     0.02  timint [7]
   0.0       9.54     0.00   100000     0.00     0.02  angle [8]
...

To me this looks quite similar 8-)

I also tested the interaction of -pg with other options and there I 
found an issue with -fomit-frame-pointer. Here gcc bails out, as it 
probably should:

gcc -pg -O2 -Wall -fomit-frame-pointer -c test.c
gcc: -pg and -fomit-frame-pointer are incompatible

while clang continues and silently generates an executable that 
immediately terminates with a segmentation violation when started.

Another minor, unrelated issue I found is that this version of clang on 
i386 generates ssse2 instruction by default, while gcc and clang in 
-CURRENT generate the "classical" i387 instructions.

Kind regards,

Hans Ottevanger


More information about the freebsd-toolchain mailing list