DPS Initial Ideas

Sat May 12 22:00:33 UTC 2007

On Sat, May 12, 2007 at 03:33:02PM -0400, Kris Kennaway wrote:
> On Sat, May 12, 2007 at 11:09:35AM +0200, Michel Talon wrote:
> 
> > Seriously, the FreeBSD package
> > system is in great need of a profound overhaul, pretending it works well
> > is complete denial of reality. I hope that young people working on 
> > summer code projects will infuse *new* ideas, and not spend their
> > vacations polishing inadequate tools.
> 
> I know that this is your belief, but please try to avoid grasping at
> straws: there are elements in your argument that are along the lines
> of "The FreeBSD package system is broken and needs to be fundamentally
> changed.  Rewriting it to use SQLite is a fundamental change.
> Therefore rewriting it to use SQLite will fix the problems."
> 

Really i don't think at all this way. I think that *perhaps* SQLite
may marginally better than a Berkeley database for solving part of the
problem, not much more. What i reacted to, was the conservatism which 
pervades the community as soon as someone emits the idea of using a new tool. 

> First figure out what specific problems need to be solved, then figure
> out how to solve them, not the other way around.  So far I have seen
> little discussion of how SQLite is necessary and sufficient for fixing
> fundamental issues.  The argument in favour of SQL seems to boil down
> to "It's SQL!  You can do more complex queries...if you wanted to".

No, for me the main argument is that SQL is more familiar for many people than 
running a perl script to connect to a Berkeley database. I have also heard
that SQLite is more performant, but i would have to see it to beleive it.

> 
> Without a clear demonstration of how this would solve a problem
> associated with package management, it is not very compelling and
> basically reduces to change for the sake of change.

I think that a lot of changes are necessary, and it seems they will happen. So 
*perhaps* it may be beneficial in this sea of changes to consider a minor 
change, moving from a more traditional Berkeley database to SQLite.

> 
> As I discussed in my email yesterday, there are serious issues to be
> solved.  

I think some of the issues have nothing to do with the database question.
Some of the issues are entirely trivial to solve. One of the worst offenders
for misbehaviour of the package system is the constant changes in the port
origins and the poor standardisation of the package names. When it will
be clear that these name changes bring nothing to the table but 
introduce a lot of confusion both for end users and automated programs,
things will be easier.

It may be that borrowing from Debian the idea of "abstract" dependencies
which can be fulfilled by several concrete packages may also simplify
the dependency problem. For example tomcat may depend on "java" and java
my be fulfilled either by diablo-jdk15 or jdk15. This way when you change
from diablo-jdk15 to jdk15 you don't need to change anything to tomcat.

Another feature that Debian has, and which may happily complete the previous
one, is the specification of necessary dependencies with a version number
in a certain range (this obviously requires a reasonable standardisation of
version numbers, so that comparison of <some package>-0.99 to 
<some package>-1.0-rc doesn't depend on arcane rules). This way you don't need
to change dependencies which are in the correct range, even if a more recent
version exists. This mechanism has been imported in NetBSD pkgsrc.

And a problem which has proven useful in Debian is keeping track of the
packages which have been required by the end user and those which have been
installed as dependencies. This is the difference between apt-get and
aptitude. Apparently people are very happy to be able to remove not only
a package they have required, but also all its dependencies (which are
not required by another program) at one stroke. This also helps in case
some big package requires dependency A, but after upgrade, they have changed
their mind and require alternative dependency B. With this mechanism, after
upgrade A disappears, while without it you will have both an upgraded version
of A and B. I have observed on my machine this is an important cause 
of time monotonic bloat of the package tree.

To answer the slowness problem in registering installed packages, one may
think about making use of the INDEX file. In fact all the information that
is necessary to fill the dependency entries is contained in INDEX, and
accessible here in milliseconds with any tool such as awk. It so happens that
the ports system doesn't make any use of the INDEX file and systematically
recomputes the dependencies through recursive make invocations which are very
time consuming. Of course this requires up to date INDEX, or a mechanism to
keep INDEX continually up to date.

Part of the registration is also filling the +REQUIRED_BY files of the
dependencies of a package when one installs a package.  If this package has a
lot of dependencies this means opening, editing and closing a large number of
files. This is expensive. One may imagine using a database containing the
global dependency information, then +REQUIRED_BY files are no more necessary,
since the information can be recomputed in very little time. In my
little python experiments, recomputing the complete set of +REQUIRED_BY files
for around 700 ports takes around one second. By the way, topological sorting 
the DAG of the whole port tree (> 15 000 ports) takes of the order of 2
seconds, so it is clear that if major performance problems occur, they
cannot be ascribed to such DAG sorting.

> Some of them can be solved by improving the storage backend
> of the package database to use a database; but this is in progress
> using existing tools.

Yes, and i don't buy the idea that using *existing* tools is better than
using the best tool for the job (assuming one can prove what is the best tool,
considering power, familiarity, etc.).

> 
> Given that this work is happening (or at least will be happening, I am
> not sure when the SoC officially starts), the best thing is for
> interested people to work with Garrett to help him achieve the goals
> of his project.

Sure. I am convinced this is the reason why several people, including myself
present some ideas in the mailing list now, before Garrett begins working on
his project. Of course after that, he will be in charge, with his mentor, and
i hope they will do something wonderful. As you are well aware, designing
a very good ports system is particularly difficult, unfortunately,
particularly in the FreeBSD context where building from source is considered 
fashionable, which makes designing an efficient upgrade system almost
impossible.

> 
> Kris

-- 

Michel TALON