How to build an executable with profiling?

Roman Divacky rdivacky at freebsd.org
Wed Jan 19 10:44:52 UTC 2011


On Wed, Jan 19, 2011 at 11:18:22AM +0100, Hans Ottevanger wrote:
> On 01/18/11 22:12, Roman Divacky wrote:
> >On Tue, Jan 18, 2011 at 09:35:17AM -0800, Steve Kargl wrote:
> >>On Tue, Jan 18, 2011 at 06:16:57PM +0100, Roman Divacky wrote:
> >>>On Tue, Jan 18, 2011 at 04:43:13PM +0200, Kostik Belousov wrote:
> >>>>On Tue, Jan 18, 2011 at 03:32:05PM +0100, Roman Divacky wrote:
> >>>>>On Mon, Jan 17, 2011 at 10:44:11AM -0800, Steve Kargl wrote:
> >>>>>>How does one build an executable for profiling with clang?
> >>>>>
> >>>>>LLVM (and thus clang) does not support GPROF profiling.
> >>>>>
> >>>>>>clang -o testf -O2 -march=native -pipe -static -pg 
> >>>>>>-I/usr/local/include -I../mp testf.c -L/usr/local/lib -L../mp -lsgk 
> >>>>>>-lmpfr -lgmp -L/usr/home/kargl/work/lib -lm_clang_p
> >>>>>>clang: warning: the clang compiler does not support '-pg'
> >>>>>>
> 
> If you are really desperate to find the hotspots in your program when 
> compiled with clang, you could call clang with -v to find the call to 
> /bin/ld. Then append _p to the appropriate libs if still needed and 
> replace crt1.o by gcrt1.o while calling ld directly. E.g.
> 
> "/usr/bin/ld" -Bstatic -o testcoll /usr/lib/gcrt1.o /usr/lib/crti.o 
> /usr/lib/crtbegin.o testcoll.o angle.o apsis.o error.o minmax.o qags.o 
> qext.o qk21.o sort.o timint.o zero.o vmol.o -lm_p -lgcc -lgcc_eh -lc_p 
> -lgcc -lgcc_eh -t /usr/lib/crtend.o /usr/lib/crtn.o
> 
> You will get a profile without the number of calls for the objects 
> compiled with clang, but with the time spent. In my case:
> 
> granularity: each sample hit covers 4 byte(s) for 0.00% of 6.41 seconds
> 
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>  30.3       1.94     1.94        0  100.00%           f_timint [2]
>  20.2       3.24     1.29        0  100.00%           _mcount [3]
>  19.4       4.48     1.24 21900000     0.00     0.00  exp [4]
>  13.2       5.32     0.85        0   40.51%           vmol [1]
>   7.3       5.79     0.47        0  100.00%           f_angle [5]
>   2.8       5.98     0.18  1000000     0.00     0.00  pow [7]
>   2.7       6.15     0.17        0   48.70%           qk21 [6]
>   2.4       6.30     0.15        0  100.00%           .mcount (51)
>   0.5       6.33     0.03        0  100.00%           zero [8]
>   0.4       6.35     0.02        0  100.00%           qext [9]
>   0.4       6.38     0.02        0  100.00%           qags [10]
> ...
 
hm.. this is interesting. I wonder if it makes sense to teach the
driver about this (it's a trivial change). opinions?

> >>>>>>I suppose it will be pointless to ask, but shouldn't clang
> >>>>>>support one of the most basic gcc compiler options if clang
> >>>>>>is to replace gcc as the base system compiler?
> >>>>>
> >>>>>is GPROF really needed at this point? we have HWPMC, isnt
> >>>>>it sufficient?
> >>>>Hwpmc requires additional work for each new CPU model. Also,
> >>>>hwpmc is not supported even on all Intel or AMD CPUs, esp. older
> >>>>models, and e.g. VIA cores.
> >>>>
> >>>>Not to mention !x86 architectures.
> >>>
> >>>yes. I agree. HWPMC is not 100% solution.
> >>>
> >>>for those interested in profiling in LLVM in detail:
> >>>
> >>>         http://llvm.org/pubs/2010-04-NeustifterProfiling.html
> >>>
> >>>summary: LLVM supports inserting profiling probes (but the selection
> >>>          of places where to put them is very naive) but there's no
> >>>          "GPROF writer".
> >>>
> >>>I mailed the author of the thesis yesterday and it looks like his work 
> >>>may
> >>>get committed to upstream LLVM.
> >>>
> >>
> >>Thanks for the url and checking on the status of profiling with llvm.
> >
> >I checked the LLVM code instead and here's what I found:
> >
> >LLVM actually supports profiling, in its own format (llvmprof.out). This 
> >can
> >only be used for its PGO optimization (BasicBlockPlacement) and is very 
> >naive.
> >
> >Theoretically it should be possible to write "llvmprof.out ->  a.out.gmon"
> >converter - no idea how feasible it is. I guess it would not be very easy.
> >
> >I believe it can be sufficiently easy to write a "gprof-like dumper" for
> >the llvmprof.out files (if there's not one already) that would print
> >stuff like "foo called X times, bar called Y times". I dont know about
> >the actual measuring of time. I think it's not in the llvmprof.out.
> >
> 
> I have not yet completely read the reference provided, but my impression 
> is that it describes considerably more sophistication than needed to get 
> gprof running with clang (though the thesis looks very interesting!). 
> All gprof needs is statistical profiling as provided by the kernel 
> through profil(2) and addition by the compiler of a call to .mcount (and 
> possibly allocation of a small amount of storage) on entry of each 
> function. gcc (and pcc before it) has done this for more than 20 years, 
> although I must admit that the code generated for the amd64 using -pg is 
> a bit opaque to me (i386 is straightforward, though). The rest of the 
> machinery needed is already there (in lib/libc/gmon and e.g. 
> lib/csu/amd64/crt1.c).

would you be interested in working on adding the necessary stuff to LLVM?
if it's really just about placing .mcount calls in profiling points I believe
it should be doable.. 

roman


More information about the freebsd-toolchain mailing list