Re: PKGBASE Removes FreeBSD Base System Feature

In reply to: David G Lawrence : "Re: PKGBASE Removes FreeBSD Base System Feature"
Go to: [ bottom of page ] [ top of archives ] [ this month ]
From: Warner Losh <imp_at_bsdimp.com>
Date: Sat, 09 Aug 2025 16:23:38 UTC
On Sat, Aug 9, 2025 at 8:16 AM David G Lawrence <dg@dglawrence.com> wrote:
>
> > >    But now I'm going to say something controversial: I was disappointed
> > > by the reaction about AI, and how it could help the project, in the
> > > developers list. While I fully appreciate the concerns about "stealing"
> > > other people's work (indirectly through the training of the vast corpus of
> > > the Internet) - i.e., the potential to violate copyright, what was said in
> > > that thread - to dismiss what AI could do for the project, for the
> > > development cycle - was exceptionally, tragically, myopic. Most people in
> > > the world (and here I mean 5 Sigma +) have no idea what's about to hit
> > > them. I've been deep in AI research recently and I can tell you, first
> > > hand, well...we're in for interesting times ahead. We can either embrace
> > > it or be tossed into the scrap heap of history.
> >
> > My opinion about AI-generated codes is that "need to be clarified in
> > international law about copyrights and licenses first".
> > But it is, AFAIK, still in discussion at each countries. Not at UN.
> > This is the fatal problem.
> >
> > My assumption is "if the operator specifies the license to be
> > (including possibly to be) applied to the resulting codes,
> > AI referres only to non-violating knowledges/data/codes and
> > generate codes" is needed to be implemented by AI guys.
>
>    I would caution against the assumption that the most advanced AI today
> somehow outputs some chunk of copyrighted code "verbatim". That may have
> occurred in what I would consider _ancient_ models, the training processes
> have changed dramatically in recent times. While I can't make any definitive
> statements about what other AIs do or how they were trained, I can say
> that the modern approach is to have each batch of training be re-written
> by another model to be semantically and logically the same (and I'm not
> just talking about code here), but different in how it is expressed so
> that it captures the _idea_, but not the verbatim text. While it is true
> that there really aren't too many ways to say "for (i = 0; i < N; i++)",
> that, usually, is about where the similarity ends with the original code.
> What used to happen in AI is a phenomenon called, I think inappropriately,
> "overfitting", which basically means that the model memorized an exact
> text because it was trained on that exact text repeatedly with no variation.
> In this case, it doesn't generalize the concept - it's more like a parrot.

None of that went into the thinking of why we can't use generative AI.
The problem is that you can't claim copyright, per the US Copyright
office, on works created by AI, even if you spend a bazillion years on
the prompt. Copyright is only for works created by people. As such,
using generative AI too heavily is a non-starter because no copyright
-> no ability to license and a lot of uncertainty around the
contribution. Until the bounds of that mess are sorted out, it's hard
to accept contributions you know to be problematic. We have similar
policies around murky issues with patent IP today as well and
generally take a conservative approach there as well. Linux generally
doesn't allow it, but there's been proposals to label AI generated or
copilot assisted code created for the kernel. Last I checked, it's not
been approved and there's a lot of concern about it there as well, so
we're not unique here.

And one can't dismiss ethical concerns over the use of AI technology
out of hand. Just like you can't dismiss AI technology because some of
its uses are problematic, or have out-sized effects (jobs, energy,
creativity). It's an emerging, disruptive technology in many ways.

But debating over the use of AI runs into reality sometimes. The AI
generated slop that's appeared on github is a big waste of my time.
It's total poo. Especially the one or two AI bots we've had show up
and try to establish credibility. Big waste of time for me. It's not
just the bots, many of the AI tools let you generate crap quickly.
We've had one or two well-meaning folks show up with changes that have
lacked the judgement to know the changes are too low-value, too risky
or just plain wrong for us to evaluate. It's actively harmful to the
project because too much slop starves out the good stuff since it's
good enough that it takes a little bit of time to sort out that it's
poo...

>    But, let's take a step back and look more generally at AI. Everyone,
> for some reason, seems to be assuming that AI can only be used for coding.
> I didn't say that - and I didn't even mean to suggest it. I am talking
> more about using AI in development workflows to make it easier to examine
> the proper functioning of the code that _you_ have written or that
> _others_ have contributed. The AI doesn't need to write one line of code.
> It can just analyze what you have written and make helpful suggestions
> on how you can improve it. And AI can do much more than that. It can help
> you when you're struggling with architectural directions - big picture
> stuff. It could help the project by being a first line of analysis of
> submitted bug reports. It could be an oracle of knowledge for users
> about how to set up FreeBSD, administer it, and solve user problems.

Yes. And those uses of AI are permitted. It's code written by AI, and
then lightly modified by you that's the problem. And people are using
it for that today, or have mentioned they are. People actually using
it say for these things it can be pretty good (which matches my
experience), but not perfect.

>    If anyone in the project wants to use AI - for whatever of these
> use-cases, the very first thing you will need is: An Open Mind.

Yes. Of course. Sure, some folk have an overly-strict knee-jerk
reaction to all things AI, and you saw that in the thread. But you
always get that for new technology (it's the fear of the new that
spawned this thread, for example).

So I've used AI tools for work. They are good, like you say, at
reasoning in the large about things. They can help understand the flow
of code, where things go. They are so so about ferreting out the
philosophy behind the code, so you know why elements of the design are
there. It can point out bugs, and can do some code review to spot
common problems. It can write prose way better than i, but I still
have to edit the prose a bit to tone it down and make sure that it got
the nuanced sense of some things right.

I've also experimented with AI writing functions of larger chunks of
code for me. But to date, all the times I've tried, it's taken me
longer to debug the code, correct the comments, etc than it would have
taken me to write it from scratch. It can be good at providing the
'boilerplate' for common types of subsystems that can be filled in
faster and generally doesn't have a copyright value anyway (your for
(int i = 0; i < N; i++) example), even w/o the copyright office's
determination. It can also be good for spotting some bugs in code, but
not great (there's some false positives that take some time to chase
down, but it's been a bit of a net win when I've done it).

One area that I'd love to.see someone get interested in (and commit to
actually doing, rather than just talk about doing) is feeding our open
PRs, our open Phabs, etc into and see which ones are worth it for
humans to look at. Something that sits on the inbound pipes to the
project and gives some feedback and alerts the right people about
actually decent submissions. Those uses don't trigger most of the
concerns people have about generative AI and wouldn't pose a risk for
the project. This use of AI shows promise, is an extension of the
automation we and others have done, and can be quite useful.

So sure, we have too many luddites in the project, but generally
people have found good ways to not let them be a problem...  What we
haven't found a good thing to do is onboard new people so we can grow
the bench which helps solve the 'submission friction' problem we have
as well (which is a virtuous cycle: the people that can submit and get
stuff into the tree often submit more stuff)...  Many of the reasons
people use sub-optimal workflows today (like kernel modules in ports)
stems ultimately from our being short on resources (both engineering
and management).

Warner

> -DG
>
>  *  Dr. David G. Lawrence
> * * DG Labs
>     Pave the road of life with opportunities.