Duplicate OPT_ entries in gcc/options.h

Jung-uk Kim jkim at FreeBSD.org
Wed Jun 8 21:54:28 UTC 2016


On 06/ 8/16 05:15 PM, Dimitry Andric wrote:
> On 08 Jun 2016, at 21:11, Gerald Pfeifer <gerald at pfeifer.com> wrote:
>>
>> I got a user report, and could reproduce this, that building
>> GCC (lang/gcc, but also current HEAD, so probably pretty much
>> any version) with FreeBSD 11 and LANG = en_US.UTF-8 we get
>> conflicting entires in $BUILDDIR/gcc/options.h such as
>>
>>  OPT_d = 135,                               /* -d */
>>  OPT_D = 136,                               /* -D */
>>  OPT_d = 137,                               /* -d */
>>  OPT_D = 138,                               /* -D */
>>  OPT_d = 141,                               /* -d */
>>  OPT_D = 142,                               /* -D */
>>  OPT_d = 143,                               /* -d */
>>
>> Using LANG = en_US (without UTF-8), everything works fine.
>>
>> Any ideas what might be going on here?  (This is done via
>> AWK scripts from what I can tell, does this trigger any
>> ideas?)
> 
> It is definitely something caused by our awk in base, in any case.
> First opt-gather.awk is run to generate a flat list of all options:
> 
>   /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-gather.awk /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/ada/gcc-interface/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/fortran/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/go/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/java/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/lto/lang.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/c-family/c.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/common.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/fused-madd.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/i386/i386.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/rpath.opt /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/config/freebsd.opt > tmp-optionlist
> 
> Then opt-functions.awk is run to process optionlist into options.h:
> 
>   /usr/bin/awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-functions.awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opt-read.awk -f /usr/ports/lang/gcc/work/gcc-4.8.5/gcc/opth-gen.awk < optionlist > options.h
> 
> If I run the first step using LANG=C, or without any LANG setting, both
> optionlist and options.h are as expected.  If I run the first step using
> LANG=en_US.UTF-8, the optionlist is sorted differently, for example the
> "good" optionlist has the uppercase d options first, and much later the
> lowercase d options:
> 
>   D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing after %qs)^\-D<macro>[=<val>]   Define a <macro> with <val> as its value.  If just <macro> is given, <val> is taken to be 1
>   D^\Driver Joined Separate
>   D^\Fortran Joined Separate
>   ... much later in the file, after all options starting with an uppercase letter ...
>   d^\C ObjC C++ ObjC++ Joined
>   d^\Common Joined^\-d<letters>   Enable dumps from specific passes of the compiler
>   d^\Fortran Joined
>   d^\Java Separate SeparateAlias Alias(foutput-class-dir=)
> 
> The "bad" optionlist has the upper and lower case d options sorted
> together:
> 
>   d^\C ObjC C++ ObjC++ Joined
>   D^\C ObjC C++ ObjC++ Joined Separate MissingArgError(macro name missing after %qs)^\-D<macro>[=<val>]   Define a <macro> with <val> as its value.  If just <macro> is given, <val> is taken to be 1
>   d^\Common Joined^\-d<letters>   Enable dumps from specific passes of the compiler
>   D^\Driver Joined Separate
>   defsym=^\Driver JoinedOrMissing
>   defsym^\Driver Separate
>   d^\Fortran Joined
>   D^\Fortran Joined Separate
>   d^\Java Separate SeparateAlias Alias(foutput-class-dir=)
> 
> Note that GNU awk does *not* produce a different optionlist file when
> used with either LANG=C or LANG=en_US.UTF-8.
> 
> opt-gather.awk's sorting function looks like this:
> 
>   function sort(ARRAY, ELEMENTS)
>   {
>           for (i = 2; i <= ELEMENTS; ++i) {
>                   for (j = i; ARRAY[j-1] > ARRAY[j]; --j) {
>                           temp = ARRAY[j]
>                           ARRAY[j] = ARRAY[j-1]
>                           ARRAY[j-1] = temp
>                   }
>           }
>           return
>   }
> 
> So I am assuming that the ARRAY[j-1] > ARRAY[j] comparison works
> differently in our awk, depending on the LANG settings.  No idea when
> that changed, though, if it changed at all...

This behaviour is known for very long time:

https://svnweb.freebsd.org/changeset/base/173731

and it is not our fault:

https://www.gnu.org/software/gawk/manual/html_node/POSIX-String-Comparison.html

GNU awk produces the same output with "--posix" option.

FYI...

Jung-uk Kim

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 473 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-toolchain/attachments/20160608/9013a6cf/attachment.sig>


More information about the freebsd-toolchain mailing list