I've submitted 207175 for a clang 3.8.0 va_list handling problem for powerpc

Thu Feb 18 05:23:52 UTC 2016

My fpr related notes/worries about the fix were wrong.

I finally got some time to look at this again and I see that I somehow missed the following code when I looked before:

  // The calling convention either uses 1-2 GPRs or 1 FPR.
  Address NumRegsAddr = Address::invalid();
  if (isInt || IsSoftFloatABI) {
    NumRegsAddr = Builder.CreateStructGEP(VAList, 0, CharUnits::Zero(), "gpr");
  } else {
    NumRegsAddr = Builder.CreateStructGEP(VAList, 1, CharUnits::One(), "fpr");
  }

So the

    Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);

in Case 2 is tracking gpr vs. fpr usage contexts as it should. Also:

  llvm::Value *NumRegs = Builder.CreateLoad(NumRegsAddr, "numUsedRegs");

  // "Align" the register count when TY is i64.
  if (isI64 || (isF64 && IsSoftFloatABI)) {
    NumRegs = Builder.CreateAdd(NumRegs, Builder.getInt8(1));
    NumRegs = Builder.CreateAnd(NumRegs, Builder.getInt8((uint8_t) ~1U));
  }

  llvm::Value *CC =
      Builder.CreateICmpULT(NumRegs, Builder.getInt8(8), "cond");

is using the same bounds check figure (8) for gpr and fpr.

Apparently that common bound is one reason that the clang numbering is not the same as the ABI document's numbering: clang's numbering allows using the same figure for both contexts. (Given the prior alignment for isI64 (or isF64 with IsSoftFloatABI).)

Sorry for the prior noise about fpr.

It is still true that DOUBLE_OR_FLOAT is untested so far.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-16, at 5:51 AM, Mark Millard <markmi at dsl-only.net> wrote:

By the way: Nothing tested or seen so far checks DOUBLE_OR_FLOAT handling.

That involves fr (fpr in va_list in clang terms) instead of gr/gpr. fr/fpr has its own independent count and bound for using floating point registers vs. using the overflow area. There is also condition register bit 6 that indicates if floating point is involved overall or not.

Ultimately which of gpr vs. fpr and which bound (if the numbers are distinct in value) depends on the type specified in va_arg (SIMPLE_ARG/LONG_LONG vs. DOUBLE_OR_FLOAT status).

This may mean that the fix is an improvement for some types of usage but not a complete update: It is wrong for DOUBLE_OR_FLOAT instances of var_arg as stands. fpr would need to be involved instead. For world I expect it is fairly generally an improvement.

Also if the condition register indicates floating point is involved overall then there is likely management/handling of floating point state (for context switching management). (If it indicates no floating point involvement then there might be optimizations possible.)

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-16, at 2:45 AM, Mark Millard <markmi at dsl-only.net> wrote:

I used:

> Index: /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp
> ===================================================================
> --- /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp	(revision 295601)
> +++ /usr/src/contrib/llvm/tools/clang/lib/CodeGen/TargetInfo.cpp	(working copy)
> @@ -3569,6 +3569,8 @@
>  {
>    CGF.EmitBlock(UsingOverflow);
> 
> +    Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);
> +
>    // Everything in the overflow area is rounded up to a size of at least 4.
>    CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);

as my test change. (Get evidence of operation before potential cleanup of the duplicated 8's.)

After a full buildworld/installworld based on the updated compiler. . .

My simple example of the problem no longer fails.

"ls -l -n /" now works.

"svnlite update -r295601 /usr/src" now works.

So whatever you want to do for the details of any submitted code, the basics of the change do avoid the SEGVs and allow these programs to work.

===
Mark Millard
markmi at dsl-only.net

On 2016-Feb-15, at 4:27 PM, Mark Millard <markmi at dsl-only.net> wrote:

On 2016-Feb-15, at 1:20 PM, Mark Millard <markmi at dsl-only.net> wrote:
> 
> On 2016-Feb-15, at 12:18 PM, Roman Divacky <rdivacky at vlakno.cz> wrote:
>> 
>> On Mon, Feb 15, 2016 at 12:17:50PM -0800, Mark Millard wrote:
>>> On 2016-Feb-15, at 11:11 AM, Roman Divacky <rdivacky at vlakno.cz> wrote:
>>>> 
>>>> Mark, I believe you're right. What do you think about this patch?
>>>> 
>>>> Index: tools/clang/lib/CodeGen/TargetInfo.cpp
>>>> ===================================================================
>>>> --- tools/clang/lib/CodeGen/TargetInfo.cpp	(revision 260852)
>>>> +++ tools/clang/lib/CodeGen/TargetInfo.cpp	(working copy)
>>>> @@ -3599,6 +3599,8 @@
>>>> {
>>>> CGF.EmitBlock(UsingOverflow);
>>>> 
>>>> +    Builder.CreateStore(Builder.getInt8(11), NumRegsAddr);
>>>> +
>>>> // Everything in the overflow area is rounded up to a size of at least 4.
>>>> CharUnits OverflowAreaAlign = CharUnits::fromQuantity(4);
>>>> 
>>>> 
>>>> Can you test it?
>>> 
>>> It may be later today before I can start the the test process.
>>> 
>>> While your change is not wrong as presented, it does seem to be based on the ABI document's numbering with the range 3 <= gr <12, where 3 <= gr < 11 cover r3-r10 use and gr=11 implies overflow stack area use. (gr being the ABI documents name.)
>>> 
>>> The clang code generation that I saw while analyzing the problem and the clang source that you had me look at did not use that numbering. Instead it seems to be based on 0 <= gpr < 9, where 0 <= gpr < 8 cover r3-r10 use and gpr=8 implies overflow stack area use. (gpr being what gdb showed me as I remember.) In other words: clang counts the number of "parameter registers" already in use as it goes along instead of tracking register numbers that have been used.
>>> 
>>> So assigning any value that appears to be positive and >= 8 should work, such as:
>>> 
>>> Builder.CreateStore(Builder.getInt8(8), NumRegsAddr);
>> 
>> Can you check what number gcc uses? We want to be interoperable with gcc.
>> 
>> Anyway, thanks for testing!
>> 
>> Roman
> 
> I'll do that check of gcc 4.2.1 code generation before starting the test later today.
> 
> But if the clang numbering is different in gcc 4.2.1 then far more than just adding a
> 
>> Builder.CreateStore(Builder.getInt8(?), NumRegsAddr)
> 
> 
> for some "?" would need to be involved in the changes in order to reach compatibility.
> 
> 
> I'll note that for clang 3.8.0 the actual comparison instruction generated is of the form
> 
>> cmplwi  r?,7
> 
> 
> for some r?, such as r5 or r4, and the conditional branch generated is a bgt instruction.
> 
> ===
> Mark Millard
> markmi at dsl-only.net

gcc 4.2.1 generates comparison instructions for va_arg of the form:

cmplwi  cr7,r0,8

and the conditional branch generated is a "bge cr7, . . ." instruction.

So the same number range is in use by both compilers: They are compatible for the bounds checks for reg vs. overflow for how they count, equality inclusion/exclusion matching up with the specific number (8 vs. 7) used to make things the same overall.

Other aspects of the code generation distinctions would take me time to analyze. It will be a while before I will be looking at other points.

===
Mark Millard
markmi at dsl-only.net