Enforcing "DIST_SUBDIR/DISTFILE" uniqueness

Doug Barton dougb at FreeBSD.org
Sun Aug 20 23:23:31 UTC 2006


I'm combining both of your responses to save time.

Andrew Pantyukhin wrote:
> On 8/20/06, Doug Barton <dougb at freebsd.org> wrote:
>> Andrew Pantyukhin wrote:
>> > On 8/16/06, Andrew Pantyukhin <infofarmer at freebsd.org> wrote:
>> >> I'd like to propose a policy to enforce a change in
>> >> DIST_SUBDIR whenever a distfile is rerolled in-place, i.e.
>> >> when checksum changes, but name stays unchanged.
>> >>
>> >> Moreover, effort should be made whenever possible to
>> >> make the old file available for download from an
>> >> alternative location.
>> >>
>> >> This policy will rid us of some fetch-related headaches.
>> >> It also will make it possible to share distfiles between
>> >> hosts with ports trees of different dates. Some rare issues
>> >> might also be resolved as a result of this. For one, ftp
>> >> mirrors could be configured to allow upload, but deny
>> >> modification and/or deletion.
>> >>
>> >> One thing I would personally frown upon is using
>> >> something like "fetch -o othername" to save a file with a
>> >> different name. It looks all right, but it prevents us from
>> >> looking for mirrors in an automated way when master
>> >> sites go down.
>> >
>> > Well, if no one is really against,
>>
>> I am violently against this proposal, but I was really hoping
>> that someone else would speak up first.
> 
> No need to be that violent, pal. Nothing's been set in stone yet
> and the reason for me writing here is to discuss it, not fight
> over it.

My intention is not to fight over it either. If the terminology is
problematic for you, feel free to substitute "very strongly opposed" instead.

>> > I'll start preparing statements
>> > for documentation and thinking about a way to watch for
>> > "violations". I also intend to go through CVS and find past
>> > "offenders" to prod them about it.
>> >
>> > The recent openoffice update rerolled a file in-place, and while
>> > it may seem irrelevant or even beneficial (erasing 286Mb of
>> > the old file), the fact is that it prevents us from keeping distfile
>> > history on unversioned file servers,
>>
>> IMO this represents a very small minority of FreeBSD users,
>> and frankly I feel that it is incumbent on you to solve this problem
>> for your circumstance.
> 
> The percentage of FreeBSD users who need 5-10 year old
> sources in the CVS is very small, too.

Therefore, IMO, we should not be complicating the lives of the vast majority
of freebsd users (not to mention taking up some small portion of additional
space on the mirrors, etc.) in order to do what you suggest.

> But we treasure our src history and don't throw out any commits.

I don't see the two things as being equivalent at all. The least of the
reasons being that what's in our repo is the history of our project. What
you're asking for is that we dedicate resources to archiving the history of
other projects. (And yes, I realize that you could argue that because
version xyz was in _our_ ports tree at some point in time that it's part of
_our_ history, but I don't buy it.)

> Well, I happen
> to treasure our ports history. I really want people to have a
> chance, however slim, to be able to build ports using a very
> old tree.

Then I think, by all means, you should put together a resource for them to
be able to do that. I don't think (for whatever that's worth) that it should
be the ports tree.

>> OTOH, your solution would break the logic that portmaster (and I believe
>> portupgrade also) uses to detect and delete stale distfiles.
> 
> AFAICT portmaster's logic still misses the case when
> DIST_SUBDIR has changed for whatever reason.
> 
> portupgrade --distclean will not be broken, it deals with
> distfiles at the current DIST_SUBDIR
> 
> portsclean -D is actually broken now, and will be fixed if
> my proposal is implemented. It doesn't erase an old file if
> its path/name match those of a new file.

Actually portmaster and portupgrade share these characteristics. If the
subdir changes with a new version of the port, portmaster will not "see" the
old files.

> Oh, now that I've had another look at portmaster's logic it
> doesn't makes sense at all.

It might not make sense to you, but it actually works in the vast majority
of cases, so it's not entirely without merit. :)

> What if distfiles of different
> ports have similar %[-_]* names?

Then the user is given a choice of whether or not to delete the file, unless
they've chose to always or never delete distfiles. My design choice is to be
aggressive, and try to clean up more, not less. That said, the new method
that I use (as of version 1.6) creates significantly fewer false positives
than it did previously.

> What if different ports require the same distfile of different versions?

That's an edge case, but it does happen. The user either needs to know this,
or run the risk of downloading the distfile again. For users that value
network bits more than disk bits, they can either use the -D option, or
choose to carefully monitor what files are deleted. Or, not use portmaster,
which is of course a valid option. :)

> What if distname changed radically?

Again, an edge case, but it does happen. See below.

> You can't make such broad
> assumptions about distfile patterns. You should probably
> do it the same way portsclean -D does - i.e. to check
> "dist_subdir/distfile" against distinfo files of all installed
> ports or all ports, whichever a user prefers.

IMO this wouldn't actually help with either of the cases that you describe,
unless you were to build a database of installed ports and distfiles. And
building "extra" databases is exactly what I'm trying to avoid doing. I
could also go into some detail about why even using the file name patterns
from the distinfo file to glob against really isn't any better than the way
I do it, but I won't because ...

The real solution to this is something that a few of us kicked around a
while back, but unfortunately it never gained traction. Namely to record the
subdir (if any) and distfile/patchfile names in the +CONTENTS file at
install/package time. That would completely remove the ambiguity as to which
distfiles to remove for the _current_ (installed) port. It would still leave
the problem of how to deal with some of the edge cases that you described,
and of course you still have to use something similar to the way I do it in
order to find stale files that are older than the version that we're
deinstalling. But IMO we're bordering on a 95/5 rule here, and _my_ goal is
not antiseptic cleanliness in this area. With over 15,000 ports, any
solution that is "right" most of the time is way ahead of the game, and
adding this info to the +CONTENTS file would make it easier (and cheaper) to
get it right way more often than not.

So meanwhile, back to your original proposal, I think you're asking to add a
lot of complexity, and other costs to something that is fairly simple now,
without providing a corresponding benefit to even a significant minority of
our users. And I'll leave it at that for now, and let some other folks speak
up if they so desire.

Doug

-- 

    This .signature sanitized for your protection



More information about the freebsd-ports mailing list