svn commit: r266553 - head/release/scripts

Mon May 26 22:31:10 UTC 2014

On May 26, 2014, at 4:18 PM, Tijl Coosemans <tijl at FreeBSD.org> wrote:

> On Mon, 26 May 2014 09:53:57 -0600 Warner Losh wrote:
>> On May 26, 2014, at 8:39 AM, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:
>>> On 05/26/14 02:35, Tijl Coosemans wrote:
>>>> On Sat, 24 May 2014 19:00:18 -0600 Warner Losh wrote:
>>>>> On May 24, 2014, at 5:53 PM, Warner Losh <imp at bsdimp.com> wrote:
>>>>>> On May 24, 2014, at 5:13 PM, Tijl Coosemans <tijl at freebsd.org> wrote:
>>>>>>> There isn't necessarily any chroot environment.  There's one kernel,
>>>>>>> two equally valid ABIs (ILP32 and LP64) and any binary like uname might
>>>>>>> use either of them.  If uname -p returns a different result depending on
>>>>>>> which of these two ABIs it was compiled for that could be a problem for
>>>>>>> any script that uses it.
>>>>>> Well, it depends on what you want to do with the script, eh? If you want
>>>>>> to know the ABI of the native binary uname, that’s one thing. But if you
>>>>>> want to know the supported ABIs, you are doing it wrong by using uname.
>>>>>> You should be using sysctl kern.supported_abi. That will tell you all the
>>>>>> ABIs that you can install packages for on this machine, which is what you
>>>>>> really want to know. So I’m having trouble connecting the dots between
>>>>>> this and what you are saying here.
>>>>>> 
>>>>>> I still am absolutely flabbergasted why the MACHINE_ARCH names aren’t
>>>>>> necessary and sufficient for packaging. I’ve yet to see any coherent
>>>>>> reason to not use them.
>>>>> Why do I care that they match? Good question. When I was doing FreeNAS, I
>>>>> looked at integrating pkgng into nanobsd. At the time this was quite
>>>>> difficult because every single architecture name was different between
>>>>> pkgng and MACHINE_ARCH.  This would mean I’d have to drag around a huge
>>>>> table to know how to translate one to the other (there was no simple regex
>>>>> either, and things like mipsn32 wouldn’t have fit into the scheme at the
>>>>> time). I would very much like us to see us keep these names in sync and
>>>>> avoid large translation tables that are difficult to maintain.
>>>>> 
>>>>> Now, do you need to get it from uname -p? No. If you want to parse elf
>>>>> files to get it, that’s fine, so long as the names map directly to the
>>>>> MACHINE_ARCH names that we’ve been using for years. They completely
>>>>> describe the universe of supported platforms. Are they perfect? No, around
>>>>> the edge there may be an odd-ball that’s possible to build, but is
>>>>> unsupported and likely doesn’t work at all. Have we learned from these
>>>>> mistakes? Yes. Anything that’s actively supported has a proper name. This
>>>>> name is needed, btw, so that any machine can self-host, a nice feature of
>>>>> the /usr/src system.
>>>> ABI consists of the following elements:
>>>> 
>>>> - OS
>>>> - OS ABI version (major version number in FreeBSD)
>> 
>> These two are encoded in FreeBSD and major version. There’s no problem
>> encoding these in the package architecture string. They are easily
>> scriptable and totally obvious to FreeBSD users and pose no problems.
>> Nobody is opposed to these, and actually they are rather a good idea.
>> 
>>>> - instruction set
>>>> - programming model (ILP32 or LP64)
>>>> - byte order (little/big endian)
>> 
>> These three are encoded in MACHINE_ARCH and have been for quite some
>> time. And you forgot several things as well: register conventions,
>> calling conventions, stack alignment, struct alignment, pointer
>> conversion conventions, address space layout, page size constraints,
>> etc. There are simply far too many to try to break down like you are
>> trying to do. And that’s even before we get into shared library
>> conventions...
> 
> I didn't forget them, I just restricted it to the elements that came up
> so far.  All these extra elements are like byte order: you use only one
> of each per combination of the first four fields so they can be discarded.
> Things like calling conventions and register use can be considered part
> of the programming model.  The amd64 programming models that matter to
> FreeBSD (both ILP32 and LP64) are documented in the System V Application
> Binary Interface AMD64 Architecture Processor Supplement.

Well yes and no. n32 and n64 have vastly different register conventions than o32. But we’re talking about something that’s off in the weeds. It doesn’t matter. It also doesn’t address my basic thesis “MACHINE_ARCH is enough” which you’ve not shown a coherent example of where it isn’t.

>>>> These are almost orthogonal dimensions in the sense that almost any
>>>> combination is possible.  (A combination that isn't possible is a
>>>> 32-bit instruction set with LP64.)
>> 
>> All of these items are encoded in MACHINE_ARACH and have been for at
>> least a decade. There’s no new argument here.  If they were actually
>> orthogonal, then that would be one thing. But they aren’t. They are all
>> closely interrelated and we only support a vanishingly small number of
>> possible conventions. Combinatorically, it can be hundreds. Practically,
>> it is usually only a handful.
>> 
>>>> What you are asking for now is to combine two dimensions into one and
>>>> combination in this case means multiplication so if you have 3
>>>> instruction sets and 2 programming models, the combined dimension needs
>>>> 6 different values.  You need to make the case for why you think this
>>>> is a good idea.
>> 
>> Because uanme has to be 6 different things so the right binaries are
>> built. It is really that simple.
> 
> Uname is a per system (or per jail) setting.  Whether you then want a
> 32-bit or 64-bit address space is a separate per program or per package
> setting.  If you want to install a package you need to know the system
> you're on and then you need to decide whether you'll use it with a large
> amount of data that requires a 64-bit address space or whether a 32-bit
> address space is enough and you want the performance benefit it gives
> (smaller pointers means lower memory and cpu cache use and 32-bit pointer
> arithmetic may be a bit faster).

I fail to see how this is relevant to the discussion. If you want to install a package, just install what uname -p returns. Unless the user says “do FRED instead” via a command line argument. Then validate that against the list of supported ABIs (or just allow it if you are forcing). It really should be just that simple. I’m running on arm, and uname returns armv6, then install the armv6 packages and not the armv6hf or the armeb packages. No need to parse elf headers to get that.

>>>> For the past 20 years we got away with this because
>>>> on every installation of FreeBSD we only used one programming model at
>>>> a time.  This is still the case for byte order of course.
>> 
>> This isn’t true. For the past 15 years we’ve supported two programming
>> models on amd64 at the same time. For longer than that we’ve supported
>> linux emulation on i386. The project has known about these things for a
>> long long time, and has settled on MACHINE_ARCH to represent all possible
>> builds. We’ve had mixed MIPS for about a decade, though the support has
>> varied in quality and execution. We learned that TARGET_BIG_ENDIAN was
>> bad, really bad, and we had to have a separate name for each ABI we
>> supported with no external info apart from that name. We could have
>> easily picked the convention you are proposing here, but we didn’t. We
>> picked another one.
>> 
>> Also, the “for the past 20 years” argument cuts both ways. Look at
>> NetBSD. There, they have the same convention we have here of having a
>> separate MACHINE_ARCH for each ABI. They have been even more successful
>> at it that we have, and have avoided the pitfalls of TARGET_BIG_ENDIAN
>> much better than we have. pkgsrc ties nicely into that. so for 20 years
>> people have successfully used the current model, not just in FreeBSD,
>> but also elsewhere.
> 
> I'm talking about cases where the first three fields listed above are
> not sufficient to distinguish between ABIs.  The cases you listed are
> already handled by those three, like linux != freebsd for the OS field
> and i386 != amd64 for the instruction set.

You’ve yet to provide an actual example where this is  the case.

>>>> What I'm saying is to keep the option open for installations with
>>>> multiple programming models, where most binaries could use ILP32 and
>>>> only the ones that actually need a 64-bit address space use LP64.
>>>> You query the instruction set using uname and the programming models
>>>> using getconf.
>> 
>> What I’m saying is I don’t see any benefit at all to our users to
>> having an additional, arbitrary sting they have to deal with. There’s
>> actually quite a few other details that you need to know before you can
>> even call getconf.
> 
> getconf _POSIX_V6_ILP32_OFFBIG
> getconf _POSIX_V6_LP64_OFF64
> 
> It'll print "1" when supported, "undefined" otherwise.  Currently only
> one is supported per system (or per jail) so MACHINE_ARCH is sufficient
> to describe the ABI.  When both are supported on one system (or one jail)
> (i.e. both commands print "1"), the system (or jail) MACHINE_ARCH is not
> sufficient to distinguish between them.  You have to specify something
> extra in this case (the fourth field) to indicate which of the two
> packages you want.

None of this is really relevant to the discussion. MACHINE_ARCH is totally sufficient. You completely misunderstand. Let me explain.

MACHINE_ARCH uniquely defines the ABI.

Now, there are some kernels that support running multiple MACHINE_ARCHs. amd64 is one. It supports amd64 and i386 (and soon i386t64). In the jail you can ask the sysctl what are the supported things.

And if you are on amd64 and want to install an i3866 package, a simple command line argument will take care of that, and it can even be validated with the supported abi sysctl.

>>>> I suppose you could replace the "x86" in the pkg scheme with i386/amd64,
>>>> but then you'd still be talking about i386:32, amd64:32 and amd64:64
>>>> instead of x86:32, x86:x32 and x86:64.  
>> 
>> I suppose you could replace these by “i386”, “x32” (or “amd64x32”) and
>> “amd64” respectively.
> 
> So you're on an amd64 or mips64 system (as indicated by uname) but you
> want to use the 32-bit package if possible.  How does your script know
> about the magic "x32", "amd64x32" or "mipsn32" strings?  Wouldn't it be
> easier if you could just use "`uname -p`:32”?

Oh give me a break. You know it because you know you are building for mipsn32 because that’s what you’ve set MACHINE_ARCH or TARGET_ARCH to, which might be uname -p if that’s left unspecified. No, you can’t just say ‘uname -p’:32. Sorry. That’s lame and generally won’t work. Have you actually tried to write a script that turns a MACHNIE_ARCH into one of these funky pkg names? It is a maze of special cases that has to be updated each time a new MACHINE_ARCH is added to FreeBSD. It would be so much more convenient for script writers and users of our system to have only one thing to specify rather than two, the second of which is just arbitrarily different without adding any value.

> I do realise it doesn't quite work like this right now because pkg uses
> "x86" instead of "amd64" or "i386" for the third field and uses the
> fourth field to distinguish between them.  I don't know if this is unique
> to the x86 family or if this is also the case for the others.  This may
> need to be reconsidered, but the idea of a fourth field is solid as far
> as I can see.

What possible benefit is there? You keep dodging this question. So far you’ve shown no benefit what so ever, and lots of hassle. It is cool because it is different, and it is more descriptive, but it doesn’t add any value.

Warner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20140526/5f4c7431/attachment.sig>