svn commit: r266553 - head/release/scripts

Mon May 26 22:18:49 UTC 2014

On Mon, 26 May 2014 09:53:57 -0600 Warner Losh wrote:
> On May 26, 2014, at 8:39 AM, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:
>> On 05/26/14 02:35, Tijl Coosemans wrote:
>>> On Sat, 24 May 2014 19:00:18 -0600 Warner Losh wrote:
>>>> On May 24, 2014, at 5:53 PM, Warner Losh <imp at bsdimp.com> wrote:
>>>>> On May 24, 2014, at 5:13 PM, Tijl Coosemans <tijl at freebsd.org> wrote:
>>>>>> There isn't necessarily any chroot environment.  There's one kernel,
>>>>>> two equally valid ABIs (ILP32 and LP64) and any binary like uname might
>>>>>> use either of them.  If uname -p returns a different result depending on
>>>>>> which of these two ABIs it was compiled for that could be a problem for
>>>>>> any script that uses it.
>>>>> Well, it depends on what you want to do with the script, eh? If you want
>>>>> to know the ABI of the native binary uname, that’s one thing. But if you
>>>>> want to know the supported ABIs, you are doing it wrong by using uname.
>>>>> You should be using sysctl kern.supported_abi. That will tell you all the
>>>>> ABIs that you can install packages for on this machine, which is what you
>>>>> really want to know. So I’m having trouble connecting the dots between
>>>>> this and what you are saying here.
>>>>> 
>>>>> I still am absolutely flabbergasted why the MACHINE_ARCH names aren’t
>>>>> necessary and sufficient for packaging. I’ve yet to see any coherent
>>>>> reason to not use them.
>>>> Why do I care that they match? Good question. When I was doing FreeNAS, I
>>>> looked at integrating pkgng into nanobsd. At the time this was quite
>>>> difficult because every single architecture name was different between
>>>> pkgng and MACHINE_ARCH.  This would mean I’d have to drag around a huge
>>>> table to know how to translate one to the other (there was no simple regex
>>>> either, and things like mipsn32 wouldn’t have fit into the scheme at the
>>>> time). I would very much like us to see us keep these names in sync and
>>>> avoid large translation tables that are difficult to maintain.
>>>> 
>>>> Now, do you need to get it from uname -p? No. If you want to parse elf
>>>> files to get it, that’s fine, so long as the names map directly to the
>>>> MACHINE_ARCH names that we’ve been using for years. They completely
>>>> describe the universe of supported platforms. Are they perfect? No, around
>>>> the edge there may be an odd-ball that’s possible to build, but is
>>>> unsupported and likely doesn’t work at all. Have we learned from these
>>>> mistakes? Yes. Anything that’s actively supported has a proper name. This
>>>> name is needed, btw, so that any machine can self-host, a nice feature of
>>>> the /usr/src system.
>>> ABI consists of the following elements:
>>> 
>>> - OS
>>> - OS ABI version (major version number in FreeBSD)
> 
> These two are encoded in FreeBSD and major version. There’s no problem
> encoding these in the package architecture string. They are easily
> scriptable and totally obvious to FreeBSD users and pose no problems.
> Nobody is opposed to these, and actually they are rather a good idea.
> 
>>> - instruction set
>>> - programming model (ILP32 or LP64)
>>> - byte order (little/big endian)
> 
> These three are encoded in MACHINE_ARCH and have been for quite some
> time. And you forgot several things as well: register conventions,
> calling conventions, stack alignment, struct alignment, pointer
> conversion conventions, address space layout, page size constraints,
> etc. There are simply far too many to try to break down like you are
> trying to do. And that’s even before we get into shared library
> conventions...

I didn't forget them, I just restricted it to the elements that came up
so far.  All these extra elements are like byte order: you use only one
of each per combination of the first four fields so they can be discarded.
Things like calling conventions and register use can be considered part
of the programming model.  The amd64 programming models that matter to
FreeBSD (both ILP32 and LP64) are documented in the System V Application
Binary Interface AMD64 Architecture Processor Supplement.

>>> These are almost orthogonal dimensions in the sense that almost any
>>> combination is possible.  (A combination that isn't possible is a
>>> 32-bit instruction set with LP64.)
> 
> All of these items are encoded in MACHINE_ARACH and have been for at
> least a decade. There’s no new argument here.  If they were actually
> orthogonal, then that would be one thing. But they aren’t. They are all
> closely interrelated and we only support a vanishingly small number of
> possible conventions. Combinatorically, it can be hundreds. Practically,
> it is usually only a handful.
>
>>> What you are asking for now is to combine two dimensions into one and
>>> combination in this case means multiplication so if you have 3
>>> instruction sets and 2 programming models, the combined dimension needs
>>> 6 different values.  You need to make the case for why you think this
>>> is a good idea.
> 
> Because uanme has to be 6 different things so the right binaries are
> built. It is really that simple.

Uname is a per system (or per jail) setting.  Whether you then want a
32-bit or 64-bit address space is a separate per program or per package
setting.  If you want to install a package you need to know the system
you're on and then you need to decide whether you'll use it with a large
amount of data that requires a 64-bit address space or whether a 32-bit
address space is enough and you want the performance benefit it gives
(smaller pointers means lower memory and cpu cache use and 32-bit pointer
arithmetic may be a bit faster).

>>>  For the past 20 years we got away with this because
>>> on every installation of FreeBSD we only used one programming model at
>>> a time.  This is still the case for byte order of course.
> 
> This isn’t true. For the past 15 years we’ve supported two programming
> models on amd64 at the same time. For longer than that we’ve supported
> linux emulation on i386. The project has known about these things for a
> long long time, and has settled on MACHINE_ARCH to represent all possible
> builds. We’ve had mixed MIPS for about a decade, though the support has
> varied in quality and execution. We learned that TARGET_BIG_ENDIAN was
> bad, really bad, and we had to have a separate name for each ABI we
> supported with no external info apart from that name. We could have
> easily picked the convention you are proposing here, but we didn’t. We
> picked another one.
> 
> Also, the “for the past 20 years” argument cuts both ways. Look at
> NetBSD. There, they have the same convention we have here of having a
> separate MACHINE_ARCH for each ABI. They have been even more successful
> at it that we have, and have avoided the pitfalls of TARGET_BIG_ENDIAN
> much better than we have. pkgsrc ties nicely into that. so for 20 years
> people have successfully used the current model, not just in FreeBSD,
> but also elsewhere.

I'm talking about cases where the first three fields listed above are
not sufficient to distinguish between ABIs.  The cases you listed are
already handled by those three, like linux != freebsd for the OS field
and i386 != amd64 for the instruction set.

>>> What I'm saying is to keep the option open for installations with
>>> multiple programming models, where most binaries could use ILP32 and
>>> only the ones that actually need a 64-bit address space use LP64.
>>> You query the instruction set using uname and the programming models
>>> using getconf.
> 
> What I’m saying is I don’t see any benefit at all to our users to
> having an additional, arbitrary sting they have to deal with. There’s
> actually quite a few other details that you need to know before you can
> even call getconf.

getconf _POSIX_V6_ILP32_OFFBIG
getconf _POSIX_V6_LP64_OFF64

It'll print "1" when supported, "undefined" otherwise.  Currently only
one is supported per system (or per jail) so MACHINE_ARCH is sufficient
to describe the ABI.  When both are supported on one system (or one jail)
(i.e. both commands print "1"), the system (or jail) MACHINE_ARCH is not
sufficient to distinguish between them.  You have to specify something
extra in this case (the fourth field) to indicate which of the two
packages you want.

>>> I suppose you could replace the "x86" in the pkg scheme with i386/amd64,
>>> but then you'd still be talking about i386:32, amd64:32 and amd64:64
>>> instead of x86:32, x86:x32 and x86:64.  
>
> I suppose you could replace these by “i386”, “x32” (or “amd64x32”) and
> “amd64” respectively.

So you're on an amd64 or mips64 system (as indicated by uname) but you
want to use the 32-bit package if possible.  How does your script know
about the magic "x32", "amd64x32" or "mipsn32" strings?  Wouldn't it be
easier if you could just use "`uname -p`:32"?

I do realise it doesn't quite work like this right now because pkg uses
"x86" instead of "amd64" or "i386" for the third field and uses the
fourth field to distinguish between them.  I don't know if this is unique
to the x86 family or if this is also the case for the others.  This may
need to be reconsidered, but the idea of a fourth field is solid as far
as I can see.