Re: RFC: Adopting SPDX for SBOM generation

From: Rob Norris <robn_at_despairlabs.com>
Date: Fri, 08 Aug 2025 01:35:19 UTC
On Fri, 8 Aug 2025, at 10:40 AM, Warner Losh wrote:
> On Thu, Aug 7, 2025, 5:42 PM Rob Norris <robn@despairlabs.com> wrote:
>> Incidentally, this year we (OpenZFS) started some work on getting license tags onto everything and enforcing them in CI, see https://github.com/openzfs/zfs/pull/17001
> 
> What tool did you use to change those 3200-odd files? We have 150k files last I checked...

Honestly, Perl one-liners. Most of the files have the CDDL files have the same text, so I matched those and shoved a tag in the top, which covered a couple-thousand. Then I started writing the checker, and having it spit out things that were missing. Then I did a oneliner for the GPL files (most of the Linux SPL). Rinse-repeat until I got enough thart I could start understanding the one-offs and the exceptions. It was like, a week of evenings, and mostly mechanical search-and-append.

Maybe not the best method for 150K files, I dunno, but I probably would have started the same way, and it might work fine if there's not much variation among those files. Also it was helped by almost everything already having a license header of some sort, so its mostly a text matching problem. There's a few that don't that I had to go back through the git history to try to establish provenance, and there's some we haven't got around to yet (see the exception list). If you had a lot of mixed and incomplete history, then another approach might be needed.

Rob.