How to build an executable with profiling?

Hans Ottevanger hansot at iae.nl
Wed Jan 19 10:34:02 UTC 2011


On 01/18/11 22:12, Roman Divacky wrote:
> On Tue, Jan 18, 2011 at 09:35:17AM -0800, Steve Kargl wrote:
>> On Tue, Jan 18, 2011 at 06:16:57PM +0100, Roman Divacky wrote:
>>> On Tue, Jan 18, 2011 at 04:43:13PM +0200, Kostik Belousov wrote:
>>>> On Tue, Jan 18, 2011 at 03:32:05PM +0100, Roman Divacky wrote:
>>>>> On Mon, Jan 17, 2011 at 10:44:11AM -0800, Steve Kargl wrote:
>>>>>> How does one build an executable for profiling with clang?
>>>>>
>>>>> LLVM (and thus clang) does not support GPROF profiling.
>>>>>
>>>>>> clang -o testf -O2 -march=native -pipe -static -pg -I/usr/local/include -I../mp testf.c -L/usr/local/lib -L../mp -lsgk -lmpfr -lgmp -L/usr/home/kargl/work/lib -lm_clang_p
>>>>>> clang: warning: the clang compiler does not support '-pg'
>>>>>>

If you are really desperate to find the hotspots in your program when 
compiled with clang, you could call clang with -v to find the call to 
/bin/ld. Then append _p to the appropriate libs if still needed and 
replace crt1.o by gcrt1.o while calling ld directly. E.g.

"/usr/bin/ld" -Bstatic -o testcoll /usr/lib/gcrt1.o /usr/lib/crti.o 
/usr/lib/crtbegin.o testcoll.o angle.o apsis.o error.o minmax.o qags.o 
qext.o qk21.o sort.o timint.o zero.o vmol.o -lm_p -lgcc -lgcc_eh -lc_p 
-lgcc -lgcc_eh -t /usr/lib/crtend.o /usr/lib/crtn.o

You will get a profile without the number of calls for the objects 
compiled with clang, but with the time spent. In my case:

granularity: each sample hit covers 4 byte(s) for 0.00% of 6.41 seconds

   %   cumulative   self              self     total
  time   seconds   seconds    calls  ms/call  ms/call  name
  30.3       1.94     1.94        0  100.00%           f_timint [2]
  20.2       3.24     1.29        0  100.00%           _mcount [3]
  19.4       4.48     1.24 21900000     0.00     0.00  exp [4]
  13.2       5.32     0.85        0   40.51%           vmol [1]
   7.3       5.79     0.47        0  100.00%           f_angle [5]
   2.8       5.98     0.18  1000000     0.00     0.00  pow [7]
   2.7       6.15     0.17        0   48.70%           qk21 [6]
   2.4       6.30     0.15        0  100.00%           .mcount (51)
   0.5       6.33     0.03        0  100.00%           zero [8]
   0.4       6.35     0.02        0  100.00%           qext [9]
   0.4       6.38     0.02        0  100.00%           qags [10]
...

>>>>>> I suppose it will be pointless to ask, but shouldn't clang
>>>>>> support one of the most basic gcc compiler options if clang
>>>>>> is to replace gcc as the base system compiler?
>>>>>
>>>>> is GPROF really needed at this point? we have HWPMC, isnt
>>>>> it sufficient?
>>>> Hwpmc requires additional work for each new CPU model. Also,
>>>> hwpmc is not supported even on all Intel or AMD CPUs, esp. older
>>>> models, and e.g. VIA cores.
>>>>
>>>> Not to mention !x86 architectures.
>>>
>>> yes. I agree. HWPMC is not 100% solution.
>>>
>>> for those interested in profiling in LLVM in detail:
>>>
>>>          http://llvm.org/pubs/2010-04-NeustifterProfiling.html
>>>
>>> summary: LLVM supports inserting profiling probes (but the selection
>>>           of places where to put them is very naive) but there's no
>>>           "GPROF writer".
>>>
>>> I mailed the author of the thesis yesterday and it looks like his work may
>>> get committed to upstream LLVM.
>>>
>>
>> Thanks for the url and checking on the status of profiling with llvm.
>
> I checked the LLVM code instead and here's what I found:
>
> LLVM actually supports profiling, in its own format (llvmprof.out). This can
> only be used for its PGO optimization (BasicBlockPlacement) and is very naive.
>
> Theoretically it should be possible to write "llvmprof.out ->  a.out.gmon"
> converter - no idea how feasible it is. I guess it would not be very easy.
>
> I believe it can be sufficiently easy to write a "gprof-like dumper" for
> the llvmprof.out files (if there's not one already) that would print
> stuff like "foo called X times, bar called Y times". I dont know about
> the actual measuring of time. I think it's not in the llvmprof.out.
>

I have not yet completely read the reference provided, but my impression 
is that it describes considerably more sophistication than needed to get 
gprof running with clang (though the thesis looks very interesting!). 
All gprof needs is statistical profiling as provided by the kernel 
through profil(2) and addition by the compiler of a call to .mcount (and 
possibly allocation of a small amount of storage) on entry of each 
function. gcc (and pcc before it) has done this for more than 20 years, 
although I must admit that the code generated for the amd64 using -pg is 
a bit opaque to me (i386 is straightforward, though). The rest of the 
machinery needed is already there (in lib/libc/gmon and e.g. 
lib/csu/amd64/crt1.c).

Kind regards,

Hans Ottevanger


More information about the freebsd-toolchain mailing list