How to build an executable with profiling?
Hans Ottevanger
hansot at iae.nl
Sun Jan 23 15:15:42 UTC 2011
On 01/21/11 20:27, Roman Divacky wrote:
>>> This patch does three things:
>>>
>>> 1) emits "call .mcount" at the begining of every function body
>>>
>>
>> The differences on i386 between profiled and non-profiled code are not
>> as obvious as with gcc (using diff on assembly output), but on first
>> inspection it looks correct.
>
> cool :)
>
>>> 2) changes the driver to link in gcrt1.o instead of crt1.o
>>>
>>> 3) changes all -lfoo to -lfoo_p except when the foo ends with _s in
>>> the linker invocation
>>>
>>
>> Maybe it is wise to follow the gcc implementation here.
>
> ok, makes sense
>
>>> I am not sure that I did the right thing, especially in (3). Anyway,
>>> the patch works for me (ie. produces a.out.gmon that seems to contain
>>> meaningful data).
>>>
>>> I would appreciate if you guys could test and review this. Letting me
>>> know if this is correct.
>>>
>>
>> On both my systems (i386 and amd64) something goes severely wrong when
>> linking several objects (all compiled with -pg, this is amd64):
>>
>> Perhaps the invocation of the linker still needs some work (or I must
>> redo my installation) but anyhow it looks like a good job. Thanks!
>
> I rewrote the libraries rewriting part to match gcc as close as possible.
> I also think that I solved your ld problem..
>
>
> please revert the old patch and test the new one:
>
> http://lev.vlakno.cz/~rdivacky/clang-gprof.patch
>
> I believe this one is ok (works for me just fine), please test and report
> back so I can start integrating this upstream.
>
I performed a few quick tests on both i386 and amd64.
The problems I had with the invocation of ld appear to be solved. The
behavior with respect to libraries is now identical to gcc as far I can see.
The results from gprof also look very promising. For my test program on
amd64 the gprof output when using clang is
% cumulative self self total
time seconds seconds calls ms/call ms/call name
42.5 4.22 4.22 0 100.00% _mcount [5]
22.0 6.41 2.18 14700000 0.00 0.00 f_timint [6]
12.4 7.64 1.23 21900000 0.00 0.00 exp [10]
8.4 8.48 0.84 22000000 0.00 0.00 vmol [9]
5.4 9.02 0.54 6300000 0.00 0.00 f_angle [11]
3.8 9.40 0.38 0 100.00% .mcount (52)
1.9 9.59 0.19 1000000 0.00 0.01 qk21 [4]
1.9 9.78 0.19 1000000 0.00 0.00 pow [12]
0.4 9.82 0.04 200000 0.00 0.03 qags [3]
0.4 9.86 0.04 100000 0.00 0.00 zero [14]
0.3 9.89 0.03 100000 0.00 0.00 qext [16]
0.2 9.91 0.02 800000 0.00 0.00 f_apsis [15]
0.1 9.91 0.01 2500000 0.00 0.00 fmax [17]
0.1 9.92 0.01 100000 0.00 0.00 apsis [13]
0.0 9.92 0.00 1000000 0.00 0.00 fmin [18]
0.0 9.93 0.00 100000 0.00 0.03 timint [7]
0.0 9.93 0.00 700000 0.00 0.00 tol_apsis [19]
0.0 9.94 0.00 200000 0.00 0.00 sort [20]
0.0 9.94 0.00 1 1.85 5334.52 main [1]
0.0 9.94 0.00 100000 0.00 0.03 angle [8]
...
while using gcc yields
% cumulative self self total
time seconds seconds calls ms/call ms/call name
44.3 4.23 4.23 0 100.00% _mcount [5]
18.5 6.00 1.76 14700000 0.00 0.00 f_timint [6]
13.5 7.28 1.28 21900000 0.00 0.00 exp [10]
9.0 8.14 0.86 22000000 0.00 0.00 vmol [9]
5.5 8.66 0.52 6300000 0.00 0.00 f_angle [11]
4.0 9.04 0.38 0 100.00% .mcount (52)
2.0 9.24 0.19 1000000 0.00 0.00 pow [12]
2.0 9.43 0.19 1000000 0.00 0.00 qk21 [4]
0.3 9.45 0.03 100000 0.00 0.00 zero [14]
0.3 9.48 0.03 200000 0.00 0.02 qags [3]
0.2 9.50 0.02 100000 0.00 0.00 qext [16]
0.2 9.52 0.02 800000 0.00 0.00 f_apsis [15]
0.1 9.53 0.00 2500000 0.00 0.00 fmax [17]
0.0 9.53 0.00 700000 0.00 0.00 tol_apsis [18]
0.0 9.53 0.00 200000 0.00 0.00 sort [19]
0.0 9.54 0.00 100000 0.00 0.00 apsis [13]
0.0 9.54 0.00 1 2.21 4927.66 main [1]
0.0 9.54 0.00 1000000 0.00 0.00 fmin [20]
0.0 9.54 0.00 100000 0.00 0.02 timint [7]
0.0 9.54 0.00 100000 0.00 0.02 angle [8]
...
To me this looks quite similar 8-)
I also tested the interaction of -pg with other options and there I
found an issue with -fomit-frame-pointer. Here gcc bails out, as it
probably should:
gcc -pg -O2 -Wall -fomit-frame-pointer -c test.c
gcc: -pg and -fomit-frame-pointer are incompatible
while clang continues and silently generates an executable that
immediately terminates with a segmentation violation when started.
Another minor, unrelated issue I found is that this version of clang on
i386 generates ssse2 instruction by default, while gcc and clang in
-CURRENT generate the "classical" i387 instructions.
Kind regards,
Hans Ottevanger
More information about the freebsd-toolchain
mailing list