How to build an executable with profiling?
Roman Divacky
rdivacky at freebsd.org
Mon Jan 24 16:46:55 UTC 2011
On Sun, Jan 23, 2011 at 04:15:40PM +0100, Hans Ottevanger wrote:
> On 01/21/11 20:27, Roman Divacky wrote:
> >>>This patch does three things:
> >>>
> >>>1) emits "call .mcount" at the begining of every function body
> >>>
> >>
> >>The differences on i386 between profiled and non-profiled code are not
> >>as obvious as with gcc (using diff on assembly output), but on first
> >>inspection it looks correct.
> >
> >cool :)
> >
> >>>2) changes the driver to link in gcrt1.o instead of crt1.o
> >>>
> >>>3) changes all -lfoo to -lfoo_p except when the foo ends with _s in
> >>> the linker invocation
> >>>
> >>
> >>Maybe it is wise to follow the gcc implementation here.
> >
> >ok, makes sense
> >
> >>>I am not sure that I did the right thing, especially in (3). Anyway,
> >>>the patch works for me (ie. produces a.out.gmon that seems to contain
> >>>meaningful data).
> >>>
> >>>I would appreciate if you guys could test and review this. Letting me
> >>>know if this is correct.
> >>>
> >>
> >>On both my systems (i386 and amd64) something goes severely wrong when
> >>linking several objects (all compiled with -pg, this is amd64):
> >>
> >>Perhaps the invocation of the linker still needs some work (or I must
> >>redo my installation) but anyhow it looks like a good job. Thanks!
> >
> >I rewrote the libraries rewriting part to match gcc as close as possible.
> >I also think that I solved your ld problem..
> >
> >
> >please revert the old patch and test the new one:
> >
> > http://lev.vlakno.cz/~rdivacky/clang-gprof.patch
> >
> >I believe this one is ok (works for me just fine), please test and report
> >back so I can start integrating this upstream.
> >
>
> I performed a few quick tests on both i386 and amd64.
>
> The problems I had with the invocation of ld appear to be solved. The
> behavior with respect to libraries is now identical to gcc as far I can see.
>
> The results from gprof also look very promising. For my test program on
> amd64 the gprof output when using clang is
>
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 42.5 4.22 4.22 0 100.00% _mcount [5]
> 22.0 6.41 2.18 14700000 0.00 0.00 f_timint [6]
> 12.4 7.64 1.23 21900000 0.00 0.00 exp [10]
> 8.4 8.48 0.84 22000000 0.00 0.00 vmol [9]
> 5.4 9.02 0.54 6300000 0.00 0.00 f_angle [11]
> 3.8 9.40 0.38 0 100.00% .mcount (52)
> 1.9 9.59 0.19 1000000 0.00 0.01 qk21 [4]
> 1.9 9.78 0.19 1000000 0.00 0.00 pow [12]
> 0.4 9.82 0.04 200000 0.00 0.03 qags [3]
> 0.4 9.86 0.04 100000 0.00 0.00 zero [14]
> 0.3 9.89 0.03 100000 0.00 0.00 qext [16]
> 0.2 9.91 0.02 800000 0.00 0.00 f_apsis [15]
> 0.1 9.91 0.01 2500000 0.00 0.00 fmax [17]
> 0.1 9.92 0.01 100000 0.00 0.00 apsis [13]
> 0.0 9.92 0.00 1000000 0.00 0.00 fmin [18]
> 0.0 9.93 0.00 100000 0.00 0.03 timint [7]
> 0.0 9.93 0.00 700000 0.00 0.00 tol_apsis [19]
> 0.0 9.94 0.00 200000 0.00 0.00 sort [20]
> 0.0 9.94 0.00 1 1.85 5334.52 main [1]
> 0.0 9.94 0.00 100000 0.00 0.03 angle [8]
> ...
>
> while using gcc yields
>
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 44.3 4.23 4.23 0 100.00% _mcount [5]
> 18.5 6.00 1.76 14700000 0.00 0.00 f_timint [6]
> 13.5 7.28 1.28 21900000 0.00 0.00 exp [10]
> 9.0 8.14 0.86 22000000 0.00 0.00 vmol [9]
> 5.5 8.66 0.52 6300000 0.00 0.00 f_angle [11]
> 4.0 9.04 0.38 0 100.00% .mcount (52)
> 2.0 9.24 0.19 1000000 0.00 0.00 pow [12]
> 2.0 9.43 0.19 1000000 0.00 0.00 qk21 [4]
> 0.3 9.45 0.03 100000 0.00 0.00 zero [14]
> 0.3 9.48 0.03 200000 0.00 0.02 qags [3]
> 0.2 9.50 0.02 100000 0.00 0.00 qext [16]
> 0.2 9.52 0.02 800000 0.00 0.00 f_apsis [15]
> 0.1 9.53 0.00 2500000 0.00 0.00 fmax [17]
> 0.0 9.53 0.00 700000 0.00 0.00 tol_apsis [18]
> 0.0 9.53 0.00 200000 0.00 0.00 sort [19]
> 0.0 9.54 0.00 100000 0.00 0.00 apsis [13]
> 0.0 9.54 0.00 1 2.21 4927.66 main [1]
> 0.0 9.54 0.00 1000000 0.00 0.00 fmin [20]
> 0.0 9.54 0.00 100000 0.00 0.02 timint [7]
> 0.0 9.54 0.00 100000 0.00 0.02 angle [8]
> ...
>
> To me this looks quite similar 8-)
awesome! :)
> I also tested the interaction of -pg with other options and there I
> found an issue with -fomit-frame-pointer. Here gcc bails out, as it
> probably should:
>
> gcc -pg -O2 -Wall -fomit-frame-pointer -c test.c
> gcc: -pg and -fomit-frame-pointer are incompatible
>
> while clang continues and silently generates an executable that
> immediately terminates with a segmentation violation when started.
will fix today
> Another minor, unrelated issue I found is that this version of clang on
> i386 generates ssse2 instruction by default, while gcc and clang in
> -CURRENT generate the "classical" i387 instructions.
we default to i486 in -CURRENT while upstream defaults to pentium4, so
this is expected.
thank you for your great testing and help! I am gonna push it upstream
now so we'll get it with next clang/llvm update in -current
roman
More information about the freebsd-toolchain
mailing list