Sparc64 doesn't care about you, and you shouldn't care about Sparc64

Tue Nov 17 07:06:39 UTC 2015

> On Nov 16, 2015, at 11:44 PM, Alfred Perlstein <bright at mu.org> wrote:
> 
> Warner, thanks for addressing this email.  I think I wasn't clear which lead to some misunderstanding.  I'll keep this reply succinct and the rest of it inline.  Please don't take the succinctness as anything other than getting to the point.
> 
> On 11/16/15 10:22 PM, Warner Losh wrote:
>> 
>> 
>> On Mon, Nov 16, 2015 at 6:53 PM, Alfred Perlstein <bright at mu.org> wrote:
>> 
>> 
>> 
>> On 11/14/15 9:16 AM, Warner Losh wrote:
>> On Fri, Nov 13, 2015 at 11:15 PM, Elizabeth Myers <elizabeth at interlinked.me>
>> wrote:
>> 
>> You are seriously going to use "we're not NetBSD" as an argument?
>> 
>> You noticed I didn't reply to it. The argument is completely lame. FreeBSD
>> runs
>> today in a variety of markets. Some new, some not so new. The thing that
>> makes
>> each of these areas unique is that there's a thriving community around them,
>> FreeBSD still runs well enough on these machines to get something done, and
>> when things break, they get fixed in a timely manner.
>> 
>> Alpha was removed because it got broken by some changes, and stayed broken
>> for a long time despite repeated requests to fix it. Sparc64 is on the cusp
>> of that:
>> some minor things are broken, but have been fixed. The current crisis is
>> due to
>> the end of life of gcc in the tree and its fallout coupled with some
>> neglect of the
>> port due to time constraints.
>> 
>> At first I was all for removal. With more data, I'm less sure. If the
>> promises are kept
>> made in this thread, it looks to remain viable for a while, though the lack
>> of a
>> qemu-user solution means that packages for a slow platform (where they are
>> really quite useful) will remain limited. Maybe there's enough hardware
>> around
>> that third-party pkg repos can fill the gap, maybe not. I think we should
>> experiment
>> with this model and see what it produces. Give the branching of 11 as the
>> deadline
>> to show something viable...
>> 
>> One of the things I never understood about FreeBSD's method of maintaining a port was the way the platform porting was done.  We really do things in a different manner than what my perception of other OSes is.
>> 
>> My impression (please do correct me if I'm wrong) was that other OSes such as NetBSD and Linux had "platform maintainers".
>> 
>> These maintainers were around to:
>> 1) keep the ship sailing on those platforms
>> 2) guide the general code base from becoming non-portable to other architectures (within reason).
>> 3) drive the release of the architecture in question, helping the release engineer with image building and release testing.
>> 
>> For point 1 above, what that meant to me was that let's say Linus or NetBSD in general wanted to do a major or minor change on a tier 1 platform, then it was the responsibility of the *platform maintainers* to do the work on the non tier-1 platforms to keep them up to date.  Those "platform maintainers" kept those ships sailing and in return they got to be called "the $arch maintainer" which looks plenty good on a resume and also feels good for those that get excited for status.
>> 
>> I'm not sure how the people that actually take care of these things on FreeBSD differ.  There are people
>> recognized as go-to people for the different ports that are fairly active in the on-going issues that come
>> up with kernel code. Userland code doesn't seem to matter that much given the platforms we support.
>> 
>> For PowerPC, you have Nathan W and Justin Hibbits. For mips, there's Adrian, myself, Julie Mallet and a few
>> others. For arm there's a long cast of characters. For PC98 there's Takahashi-san. For sparc64 there's Marius.
>> These people keep the ship sailing (or in some cases they remove the ship form the tree). They advise
>> discussions about issues that are relevant to the platform, like cache lines and cache coherence,
> 
> This is how we diverge:
>> they call people out when they break these platforms or when people used to big systems adjust the
>> tuning and break small systems.
> That is not done as far as I can tell in NetBSD/Linux.   In Linux/NetBSD it is the job of the maintainer to keep the platform up to date, not to call out when someone breaks it.
> 
> Meaning ideal:
> "Oh someone broke alignment in this struct on my platform, let me ask them how to fix it."
> 
> As opposed to (not ideal):
> "Something broke my platform, let me track that guy down and make them fix it.”

The person that broke it is often the best person to fix it. If they don’t fix it themselves
the maintainers fix it for them. It’s common courtesy. I think you’re making too fine a
distinction here to actually be useful.

And if you know anything about NetBSD, you’ll know they do exactly the same thing
when someone does something that breaks a particular platform. The port maintainers,
who build and run the stuff the most (though they aren’t listed in any file) notice and
complain, often times within hours of the commit. They generally ask the original
committer to fix it, just like we do. And just like we do, those suggestions sometimes
come in the form of a patch or a general description of what to do and why.

Tell me, who are the NetBSD/hpcmips maintainers? Who are the NetBSD/hpcarm
folks that are still active? I just did a search, and couldn’t find this information. You
could look at the hpcmips or hpcarm trees, but they are rather quiet these last
few years, and many of the folks that contributed code there have wandered off.

>> 
>> For point 2, let's say someone had a change that pushed some form of *completely* non-portable code into the base which would break a reasonable to support platform, then the "platform maintainer" would speak up and tell the general community "uh no, you can't do X on this platform, we need to rethink this".
>> 
>> People generally don't push this kind of code into the tree these days. When they do, they get called
>> out on it. Some of them even listen to the calling out and fix things, others don't and one of the
>> platform maintainers has to fix the stupid pushed into the tree. Sometimes this happens right away
>> and sometimes there's a lag. sometimes it's code for newer versions of the platform that break older
>> versions (or vice versa). Other times there's code from another platform that breaks things.
>> 
>> USB is a textbook example of this happening. It went in, and didn't worth a damn on arm or mips.
>> The ports maintainers of the arm and mips platforms tried to explain what the issues were. It took
>> some time, but it got mostly worked out as the embedded folks got to know USB issues, and hps
>> got to understand the issues with embedded hardware.
>> 
>> For point 3, there may be a lag between release of the OS for tier 1 (x86/x64) and the secondary architectures, but that is OK because the maintainer will eventually provide images themselves or in collaboration with the release engineer.
>> 
>> For this point, we've pushed the knowledge of how to build the images into the release engineer. The
>> folks that are around that are using the port test the images. Sometimes it's the port maintainers,
>> but recently it has been a large cast of characters for popular platforms.
>> 
>> FreeBSD seems to take a different approach.  This approach is that someone (or some people) form a team to port to a platform.  These "platform porter" groups sole responsibility is to get a new architecture running.  After it's mostly running they are mostly without responsibility, however we tend to give them the right of change-set veto in perpetuity of the marginal relevance of the ported to platform.
>> 
>> so like when did this actually happen?
> Well, earlier in this email you said this exact thing:
> 
>> they call people out when they break these platforms or when people used to big systems adjust the
>> tuning and break small systems.
> That's how I see the difference.

First step is always education. That’s a strength, not a weakness. You educate people that break things
because more often than not, those people didn’t bother to ask for a review. Part of the education is
needing to make sure people social changes appropriately.

If that’s *ALL* maintainers did, I’d agree with you. However, there’s much proactive education that also
happens, which is exactly your number 2: everybody working together to make sure that new changes
fit will with the platform set. That is real. It happens every day. Focusing on only one, narrow situation
that happens maybe once a month and saying we’re doing that wrong seems petty and wrong-headed
and would actually break more than it fixes if we changed it.

So because you have a world view that doesn’t match what’s going on, you want to change what’s
going on to match some ideal from another project when this project is actually doing that idea? I
still don’t get it.

> 
>> 
>> What this means is that instead of a assigning a title and ownership of the platform to someone, who maintains the status as "maintainer" by keeping that platform working.  By keeping the platform working I am saying that they would do items 1, 2 and 3 from the NetBSD/Linux list.  However, instead nearly immediately hoist the "platform maintenance" onto the general community of people that may not have access to the hardware in question.
>> 
>> Do you have a specific example of when we've done this? As far as I know, based on powerpc, arm and
>> mips anyway, the people claiming to be maintainers are actively doing 1, 2 and helping RE do number
>> 3 to varying degrees. As far as I know, they all have access to some or all of the hardware they are
>> maintaining, and many of our power users participate in the process as well.
> 
>> 
>> Maybe this is just my perception, but it would seem to make a ton more sense to follow the NetBSD/Linux model which implies a somewhat decoupled release model (not all arches must come out on the same ) and assigning ownership and responsibility in exchange for status based on being the "platform maintainer".
>> 
>> So, rather than generalizations, be specific. Who do you think is claiming to be a port maintainer, blocking
>> progress and needs to be replaced?
>> 
>> And what, beyond what the re@ does today, would you do differently? What do we gain over what we do
>> for tier 1 platforms? Is there a platform wanting a release that isn't getting one? mips has two different
>> groups that have put out releases for it, with one of them fading into the background. Adiran is making
>> wifi builds available, already following this model you say we should adopt. The japanese user groups are
>> putting out PC98 releases now that the re@ has dropped them (they never really stopped in the mean
>> time). sparc64, powerpc, arm, i386 and amd64 are all released by re at . ia64 has been removed from the
>> tree. What other platforms are there? What else needs to be done.
>> 
>> Finally it would be pretty obvious when everyone steps down or just doesn't participate in the release process that it may be time to sunset a platform.
>> 
>> That's why we are having the conversation about sparc64. It looked like it might no longer be
>> participating in the normal process. Now, while there are some issues that were identified with
>> sparc64, some of them are real (see qemu and difficulties building in the cluster). Some of them
>> were just perception (the reduced numbers of commits to sparc64 didn't seem to represent
>> a problem with the platform and the perceived issues had been cleared up)
>> 
>> So what, specific, actionable items do we as a project actually need to do here? I'm sure there are some
>> and that we can improve our process. I'm having trouble teasing out what I, as someone who dabbles
>> in arm and mips to varying degrees of 'maintainership' for different parts, can do better or different.
>> 
> Three things:
> 1) I am wondering if core (or the community in general) should have some way of nominating a particular person as a platform     maintainer.  This would give accolades to that person and at the same time give us a point person.  I believe part of the problem is that we don't give enough status to the port maintainers, are they on the website?  How would I know who is the "king of mips" right now?  Does someone get to put in on their resume with backing from the project?  If not, then the maintainer will be grumbly as opposed to facilitating.

We don’t have a “king of the kernel” or “king of mips” or anything like that. We document that you send mail to mips@
when you don’t know the right person to send.

As for status reports: I agree with you on that.

As for putting things on one’s resume: I never needed the project’s blessing to claim to have done a lot with FreeBSD/mips, FreeBSD/arm, CardBus, PC Card, SD, MMC, PCI, etc. I just did it. The lack of a stamp from the project hasn’t really been
an issue.

> 2) The role of platform maintainer needs not to be a blocking role, but rather a continual porting role.  Specifically to avoid the "calling out" (sorry for the quotes here, but just want to get the point across) and instead function as a do-er/facilitator.  Meaning if something breaks a secondary platform it's a shared responsibility of the port maintainer and developer to fix it, but     solely the developer.

To the best of my knowledge, it isn’t a blocking role today. All the people working on arm, mips and powerpc that
I interact with already do that. Rather than fix the stupid mistake people make, educating them not to make them
again in the future is the best way to keep them from repeating. That sure as hell sounds like a shared responsibility
to me. I fail to see any evidence to support your assertion that port maintainers just whine when things break. I don’t
see that in the commit logs (although they are full of hundreds of commits from people that do complain). I don’t
see that in the interactions I’ve had.

> 3) Tight coupling of the system.  While it's good to cross train release engineering with various platforms and even more so it's     REALLY great that this is being well received, the end result is a very tight coupling of the system that can lead to frustrations and logjams when things go wrong.  Something that FreeBSD very often gets wrong is the strong unification of its various parts, you can see this in so many aspects of how we do things (releases, VFS, platforms) as opposed to other projects.  It is an admirable aspiration, however it results in much lock-step of the entire project and doesn't scale.

I can’t recall any release ever being held up for longer than it takes to build on the slowest hardware the RE has
used to cut the release. I can recall many times a release wasn’t held, or I wasn’t allowed to put changes into a
secondary platform because they might affect the primary one too close to a release. Absent any evidence of
when, where and how these things happen, I’m having a hard time seeing that the actual situation on the ground
matches the supposed one you are complaining about, at least as it comes to other platforms. What logjams are
they causing? I’ve not seen any. Do you have specific examples?

Warner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 842 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.freebsd.org/pipermail/freebsd-sparc64/attachments/20151117/1ed32f5e/attachment.bin>