CFLAGS+= -fPIC per default?

On Sunday 22 February 2004 01:27 pm, Joseph Fenton wrote:
> >>Adding CFLAGS= -fPIC to /etc/make.conf may be a local solution but
> >>are there any drawbacks by adding something like
> >>.if ${ARCH} == "amd64"
> >>CFLAGS+= -fPIC
> >>.endif
> >>to ports/Mk/
> >No.. please don't.  Although the AMD64 platform supports PIC
> > addressing modes directly, it is still a penalty.  (Although
> > thankfully, its nowhere near as expensive as it is on i386!)
> >For example, in libc when built in PIC mode:
> >#ifdef PIC
> >        movq    PIC_GOT(HIDENAME(curbrk)),%rdx
> >        movq    (%rdx),%rax
> >#else
> >        movq    HIDENAME(curbrk)(%rip),%rax
> >#endif
> >The problem is that we can't be sure that everything will be in +/-
> > 31 bit offsets of each other.  This means that PIC objects have to
> > do indirect memory references that aren't required in no-pic mode.
> >I386 also loses a general purpose register (%ebx) which is why -fpic
> > is more expensive there.  But even though we don't lose a register,
> > its still a cost because of the extra global-offset-table memory
> > references.
> >Footnote: you just made me wonder about some of these ifdefs..  We
> >shouldn't need them for intra-object references like this.  I'll
> > have to go and look again.
> Sorry to be anal, but PC-relative addressing is by definition
> position-independent code. Who was the bright individual
> who decided that when compiling PIC code to NOT use
> PC-relative and to NOT use PC-relative for non-PIC code?

Recall the last paragraph you just quoted.  I already said I thought the 
code wasn't quite right.  However, I just remembered why its done that 

Remember.. unix link semantics have interesting symbol override effects.  
Although you might normally be jumping within the same library and can 
trivially use %rip-relative addressing, if the main program overrides 
libc symbols, we must use those instead.  Thus, we can't use 
%rip-relative ways to access them because we can't be sure its going to 
be within +/- 2GB.  In fact, its guaranteed to not be the case for 
dynamic linking on FreeBSD/amd64 because the default load address for 
shared libs is around the 8GB mark.  For static linking though, we 
don't usually have this same 7.9GB hole in our symbol space.

Also.. when compiling with -fpic, you don't know whether you're linking 
pc-relative code into an application or into a shared library that 
could be loaded just about anywhere.

> This is counter-intuitive. For PIC code, you use PC-relative
> addressing in two cases: 1 - the code is guaranteed to be
> a constant distance apart, like code in the same section; 2 -
> when the loader guarantees the relative position of different
> sections, like code and data contained in a ROM.
> Case 1 could be violated by the code being too far apart
> for PC-relative addressing. This is virtually impossible for
> the AMD64 as I doubt we'll see code exceeding 2G in
> size in the next several decades. Code is only now exceeding
> a few megabytes. Case 2 is usually your problem, which leads
> to tables used to hold addresses or offsets.

Case 1 is violated by symbol overrides by the main program.

> Both sides of the #ifdef PIC are doing valid PIC code.
> PC-relative addressing should be used wherever possible
> unless it incurs a speed penalty.

gcc generally generates %rip-relative offsets where possible even 
without -fpic.

> Non-PIC code generally does PC-relative code if it
> is faster and is legal, for example, when referring to
> code within the same section. When the address must
> be set by the loader for non-PIC code, it seems to me
> that the fastest code would be like this:
>   mov     <imm32>,%rdx
>   movq    (%rdx),%rax

Guess what.. look at the original code:
   movq    PIC_GOT(HIDENAME(curbrk)),%rdx
   movq    (%rdx),%rax
The first instruction just happens to be of the form 'mov <imm32>,%rdx. 

> or if the address is > 4G
>   movq    <imm64>,%rdx
>   movq    (%rdx),%rax

Except that there is only one movq <imm64> instruction, and it only 
works with %rax as a target, and its not particularly fast.  Since 
you're guaranteed to have an offset table within +/- 2GB, you may as 
well use it.

> The loader would then set the immediate vector upon
> loading the sections. This avoids a memory hit for accessing
> a table of addresses while only adding at most 5 bytes to the
> size of the code. I would probably use this unless the user
> is compiling with flags set to compile with minimized code
> size.

Also remember that this is in libc, where its not a user code size 
compile option.  We have to cope with whatever environment we find 
outselves loaded into.  We have to assume the worst case scenario.

Incidently, for an example of what GCC does...  given this program:
extern int j;
extern int foo(int i);
bar(int i)
        return foo(i) + 10 + j;
cc -S -O   produces:
        subq    $8, %rsp
        call    foo
        addl    j(%rip), %eax
        addl    $10, %eax
        addq    $8, %rsp

cc -S -O -fPIC produces:
        subq    $8, %rsp
        call    foo at PLT
        movq    j at GOTPCREL(%rip), %rdx
        addl    (%rdx), %eax
        addl    $10, %eax
        addq    $8, %rsp

Note how the -fpic case is less efficient.  Specifically, function calls 
are trampolined via the local object's procedure linkage table rather 
than just calling them directly.. because we dont know if they're 
within +/- 2GB or not.  Or if they're even in the same object.
Secondly, it uses the global offset table to find the address of 'j' and 
then indirectly references it as a two-step sequence.  The non-pic case 
just makes a pc-relative reference in a single instruction. 

> Sorry to nit-pick like this, but having worked on both Mac
> and Amiga ROMs, PIC mode under BSD really seems
> backwards to me.

Unix library semantics are very very different to ROM semantics.  I've 
been there too.

Also, this isn't BSD-specific.  It's ELF specific and thats what the 
toolchain produces and expects.  We use the same toolchain that linux 
