svn commit: r266553 - head/release/scripts

Mon May 26 15:53:53 UTC 2014

On May 26, 2014, at 8:39 AM, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:

> On 05/26/14 02:35, Tijl Coosemans wrote:
>> On Sat, 24 May 2014 19:00:18 -0600 Warner Losh wrote:
>>> On May 24, 2014, at 5:53 PM, Warner Losh <imp at bsdimp.com> wrote:
>>>> On May 24, 2014, at 5:13 PM, Tijl Coosemans <tijl at freebsd.org> wrote:
>>>>> There isn't necessarily any chroot environment.  There's one kernel,
>>>>> two equally valid ABIs (ILP32 and LP64) and any binary like uname might
>>>>> use either of them.  If uname -p returns a different result depending on
>>>>> which of these two ABIs it was compiled for that could be a problem for
>>>>> any script that uses it.
>>>> Well, it depends on what you want to do with the script, eh? If you want
>>>> to know the ABI of the native binary uname, that’s one thing. But if you
>>>> want to know the supported ABIs, you are doing it wrong by using uname.
>>>> You should be using sysctl kern.supported_abi. That will tell you all the
>>>> ABIs that you can install packages for on this machine, which is what you
>>>> really want to know. So I’m having trouble connecting the dots between
>>>> this and what you are saying here.
>>>> 
>>>> I still am absolutely flabbergasted why the MACHINE_ARCH names aren’t
>>>> necessary and sufficient for packaging. I’ve yet to see any coherent
>>>> reason to not use them.
>>> Why do I care that they match? Good question. When I was doing FreeNAS, I
>>> looked at integrating pkgng into nanobsd. At the time this was quite
>>> difficult because every single architecture name was different between
>>> pkgng and MACHINE_ARCH.  This would mean I’d have to drag around a huge
>>> table to know how to translate one to the other (there was no simple regex
>>> either, and things like mipsn32 wouldn’t have fit into the scheme at the
>>> time). I would very much like us to see us keep these names in sync and
>>> avoid large translation tables that are difficult to maintain.
>>> 
>>> Now, do you need to get it from uname -p? No. If you want to parse elf
>>> files to get it, that’s fine, so long as the names map directly to the
>>> MACHINE_ARCH names that we’ve been using for years. They completely
>>> describe the universe of supported platforms. Are they perfect? No, around
>>> the edge there may be an odd-ball that’s possible to build, but is
>>> unsupported and likely doesn’t work at all. Have we learned from these
>>> mistakes? Yes. Anything that’s actively supported has a proper name. This
>>> name is needed, btw, so that any machine can self-host, a nice feature of
>>> the /usr/src system.
>> ABI consists of the following elements:
>> 
>> - OS
>> - OS ABI version (major version number in FreeBSD)

These two are encoded in FreeBSD and major version. There’s no problem encoding these in the package architecture string. They are easily scriptable and totally obvious to FreeBSD users and pose no problems. Nobody is opposed to these, and actually they are rather a good idea.

>> - instruction set
>> - programming model (ILP32 or LP64)
>> - byte order (little/big endian)

These three are encoded in MACHINE_ARCH and have been for quite some time. And you forgot several things as well: register conventions, calling conventions, stack alignment, struct alignment, pointer conversion conventions, address space layout, page size constraints, etc. There are simply far too many to try to break down like you are trying to do. And that’s even before we get into shared library conventions...

>> These are almost orthogonal dimensions in the sense that almost any
>> combination is possible.  (A combination that isn't possible is a
>> 32-bit instruction set with LP64.)

All of these items are encoded in MACHINE_ARACH and have been for at least a decade. There’s no new argument here.  If they were actually orthogonal, then that would be one thing. But they aren’t. They are all closely interrelated and we only support a vanishingly small number of possible conventions. Combinatorically, it can be hundreds. Practically, it is usually only a handful.

>> What you are asking for now is to combine two dimensions into one and
>> combination in this case means multiplication so if you have 3
>> instruction sets and 2 programming models, the combined dimension needs
>> 6 different values.  You need to make the case for why you think this
>> is a good idea.

Because uanme has to be 6 different things so the right binaries are built. It is really that simple. And we’ve already made the case, and have been using this convention for a very long time. It works. I’m not sure that the burden is on us to justify why a convention that’s been in use since FreeBSD 6 needs to not change. As weird as you might think it is, it is a convention that our users understand.

>>  For the past 20 years we got away with this because
>> on every installation of FreeBSD we only used one programming model at
>> a time.  This is still the case for byte order of course.

This isn’t true. For the past 15 years we’ve supported two programming models on amd64 at the same time. For longer than that we’ve supported linux emulation on i386. The project has known about these things for a long long time, and has settled on MACHINE_ARCH to represent all possible builds. We’ve had mixed MIPS for about a decade, though the support has varied in quality and execution. We learned that TARGET_BIG_ENDIAN was bad, really bad, and we had to have a separate name for each ABI we supported with no external info apart from that name. We could have easily picked the convention you are proposing here, but we didn’t. We picked another one.

Also, the “for the past 20 years” argument cuts both ways. Look at NetBSD. There, they have the same convention we have here of having a separate MACHINE_ARCH for each ABI. They have been even more successful at it that we have, and have avoided the pitfalls of TARGET_BIG_ENDIAN much better than we have. pkgsrc ties nicely into that. so for 20 years people have successfully used the current model, not just in FreeBSD, but also elsewhere.

>> What I'm saying is to keep the option open for installations with
>> multiple programming models, where most binaries could use ILP32 and
>> only the ones that actually need a 64-bit address space use LP64.
>> You query the instruction set using uname and the programming models
>> using getconf.

What I’m saying is I don’t see any benefit at all to our users to having an additional, arbitrary sting they have to deal with. There’s actually quite a few other details that you need to know before you can even call getconf.

>> I suppose you could replace the "x86" in the pkg scheme with i386/amd64,
>> but then you'd still be talking about i386:32, amd64:32 and amd64:64
>> instead of x86:32, x86:x32 and x86:64.

I suppose you could replace these by “i386”, “x32” (or “amd64x32”) and “amd64” respectively. Just like we did with mips. As a users, how the heck am I to know what all these strange strings map to? I have an amd64 machine, what package do I install? x86:64 WTF is up with that? How am I supposed to message that to users? How am I supposed to write a sane script that ties together packages and base system when there are two different systems to describe the same thing? I’ve yet to see any benefit that is so huge that it trumps the ease of use for our users and the eases of script writing for the nanobsd and crochets of the world.

> 
> No. We support multiple "models" now and have for ten years. That's what MACHINE_ARCH is for: it defines the choice of the last three things you list above. Specifically, a shared value of MACHINE_ARCH guarantees and OS version guarantees, in FreeBSD-land, complete binary compatibility of executables. Kernels support multiple ones, in general (e.g. i386 binaries on amd64, powerpc binaries on powerpc64). They may support more in the future (x32 on amd64, potentially even cross-endian binaries). We have a nice flexible scheme in FreeBSD for supporting this. If you want to find out the list of the things the installed kernel can run, check the kern.supported_archs sysctl. Simple.

Don’t forget we’ve supported linux emulation for 18 years, and let’s not forget about IBCS and SYSV emulation, present from the very start as well. If we look historically at BSD, I know that BSD4.2 on the VAXen ran pdp11 binaries, which pushes back the time horizon another 10 years.

> These strings are just as expressive as the ones in pkg. They are the standard. They're what external build systems test against, what the src, doc, and ports trees use to define what to do universally. It's what users and code expect. The wheel we've had for 20 years is perfectly good -- why invent a new, incompatible one?

Exactly. What is the huge benefit that justifies the huge pain this is going to cause? The FreeBSD project already has some pain because it chose amd64 as its arch name (spread through about 2 dozen makefiles in the base that do s/amd64/x86_64/ in places. Do you want to multiply this times 6 architectures with arbitrary and difficult to explain differences to our users?

So rather than a repetition of the arguments that aren’t very strong, and certainly don’t come close to justifying the extra pain our users will feel, perhaps an argument for that future pain would be useful.

Warner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/svn-src-head/attachments/20140526/fe20870b/attachment.sig>