portupgrade O(n^m)?

Thu Mar 1 07:12:45 UTC 2007

youshi10 at u.washington.edu wrote:
> On Thu, 15 Feb 2007, Michel Talon wrote:
> 
>>> Give me a few weeks, and if I can band together with a few people I
>>> wanted to try and port sections of portupgrade and its related tools to
>>> C++ (and maybe do some code tweaks along the way). Most of the ruby
>>> files are over 400 lines long, sparsely commented, and I don't know ruby
>>> enough to port right now, but I've been making some headway lately so
>>> I'll try porting some stuff soon.
>>
>> I think that porting portupgrade to C++ would be time spent in vain. In
>> my opinion, some of the basic ideas of portupgrade are deeply flawed,
>> and as much as one polishes the algorithms it will not gain much. The
>> idea of keeping state in databases is deeply flawed, it is constantly
>> broken, and doesn't help in speed at all. This was one of the
>> motivations of portmaster, get rid of database dependencies. In my
>> opinion, upgrading progressiveley, that is, port by port, is deeply
>> flawed. There is 90% chance that something will go wrong in the middle
>> and you will be stuck with an half upgraded system.
>>
>> So in my opinion, what is needed is thinking radically new about the
>> problem, write a prototype in a scripting language to experiment with
>> the solutions, and then code it in C++. Personnally i have done that, i
>> have written a python script, which can be found here:
>> http://www.lpthe.jussieu.fr/~talon/pkgupgrade
>> (it needs the companion
>> http://www.lpthe.jussieu.fr/~talon/save_pkg.py).
>> For the time being, i still have bugs, that i am working on, but at
>> least these bugs show that the problem is vastly more complicated that
>> one can imagine at first.
>>
>> Why python? because it is much more readable than perl or ruby, and much
>> more performant than ruby. In may opinion ruby is vastly hyperhyped, it
>> is much closer to rubish than anything else.
>> What ideas? Don't use any database, database connector, do everything
>> in memory, recompute needed information on the fly. It works very well,
>> one can count on something of the order of 1mn to 2mn to perform the
>> necessary analysis for 700 ports. Second, download as much precompiled
>> packages as possible, at full speed, that is with the same connection to
>> the ftp server. This works very well, if you have a good internet
>> connection, in 15 mn to 20 mn you have your packages.
>>
>> Why packages?
>> because packages don't break when compiling. Compiling from source is
>> asking for problems. If you minimise the number of compilations you
>> minimise the risk of breakage. Moreover simultaneously with downloading
>> one can backup old packages, and so, gain time. By contrast, for every
>> packages, portupgrade first does dependency analysis that could be done
>> once, then does backup, then fetches the binary package or compiles,
>> then installs it, then discards backup. Al this is terrible loss of
>> time.
>>
>> Finally my script produces a shell script able to do the upgrade. So you
>> can look in written form to *exactly* what will be removed, what will be
>> installed by binary packages, and what will be compiled. All necessary
>> packages for installation are already present on the machine. There is
>> absolutely no element of surprise, you can evaluate the risk soundly.
>> These are the ideas i have explored.
>>
>> Now, performance wise, when you run the shell script it takes around 2
>> hours. This is entirely time spent by pkg_delete ( roughly 15 mn) and
>> pkg_add (roughly 1h45mn) for around 500 ports replaced. This is very
>> long, sure, but it can be optimized only by working on pkg_delete and
>> pkg_add. No amount of work on portupgrade or a replacement will help in
>> any way.
>>
>> As for the remaining bugs i have, they are entirely due to the crappy
>> complexity that FreeBSD port developers introduce by constantly
>> modifying the origins of the ports. So for a given program, i can have 3
>> different origins, one when the port was previously installed on the
>> machine, another one when the last RELEASE was produced, and the last
>> one if i compile now the port on the machine with the present state of
>> the ports tree. These 3 origins may be different, i have examples.
>> These morons are *constantly* modifying the names, as an exercice in
>> bikeshed painting. For example pan -> pan2 -> pan, etc. Cycles don't
>> worry them at all!
>> Of course, for a given software, you may have all combinations, such as
>> inexistant or existant at the time the machine was installed, at the
>> time of the release, or at present.
>>
>> Compare that to the situation for Debian apt-get. The names are
>> conserved. They have strict rules about package naming, they stick to
>> them and don't change them arbitrarily. All packages exist in compiled
>> form, you don't have to worry about prepackaged or "to be compiled, so
>> has 50% chance to break". You have only 2 states to consider instead of
>> 3: the state on the machine and the state on the repository. Things are
>> vastly simpler. No wonders that apt-get works and portupgrade doesn't.
>> This has nothing to do with the fact that apt-get is written in C++
> 
> (sorry to cross post, but this thread is just as relevant to @ports as 
> it is to @hackers)
> 
> Well, since you brought up Debian's apt-get system I thought it'd be a 
> good idea to take a look at the Gentoo Linux emerge / portage system 
> (patterned after Freebsd):
> 
> =====
> Pros:
> =====
> -It's written in python (portable).
> -It's a system which focuses on ports compilation from source, not 
> binary package installation.
> -Stores information in a db format (not Berkeley DB, but something 
> different)for entire system in a common file; stores installed leaf 
> package information in another simple textfile.
> -Has flags for stability reasons, since some packages are alpha or beta 
> and don't compile under certain architectures.
> -Portage files are fetched via rsync.
> -Has separate portage files which are phased out over time, in case the 
> portage maintainers move the files in one release. The maintainers then 
> create an informative message which describes what's going on while 
> emerging the package or going through the portage database. If possible 
> the outdated package is pruned and the newer, more recent dependency is 
> merged.
> 
> =====
> Cons:
> =====
> -It's written in python (not fast).
> -Uses rsync.
> 
> ======
> Point:
> ======
> Apart from what's listed in the above paragraph, Gentoo's portage may 
> have several things that are better than FreeBSD's port system:
> 
> -Limited life cycle for versioning, which doesn't force server / desktop 
> owners to fix a number of machines all at once, but instead gives them a 
> heads up before a big change occurs and automatically unmerges old 
> dependencies and emerges new items, if possible.
> -One common interface for package / portage management--not 10 little 
> tools which do basically the same thing, or are specialized for specific 
> tasks.
> -One common file for all installed packages / ports, not a series of 
> directories and files.
> -Separate versioning for files, which doesn't break things nearly as 
> much as one common ports Makefile for each file.
> -A means to search for portage items and their descriptions, without 
> having to deal with a tool that doesn't really work reliably.
> 
> It's not so much that I'm trying to bash on freebsd, but there's 
> definitely a revision that needs to be made to the way that ports / 
> packages are done, because it seems that the commitee in charge of ports 
> planning and the overall roadmap seem to have let things get a bit off 
> track, just because of the sheer number of ports items available. 
> Something can be fixed and should be. I can only do a portion of the 
> load myself in so much time, since I'm going to work and school right now.
> 
> =======
> In light of previous statement:
> =======
> 
> I wasn't trying to port the pkg_* and port* utils to C++ thinking that I 
> would magically get more optimized code. Sure, C++ is much better than 
> ruby at optimizations if done correctly, but C++ is also easier to screw 
> up than ruby or perl or python, because you have the power to shoot 
> yourself in the foot easier (not as much as C or ASM, but close).
> 
> The point was that with C++ we could finally get a set of standardized 
> tools and a common interface for FreeBSD for managing ports / packages 
> which could be included in the base system, not a bunch of little 
> specialized tools and packages.
> 
> I'll have to approach this problem from a black box perspective and be 
> carefully in planning this out, but my goal is to be as backwards 
> compatible friendly as possible or at least provide migration tools to 
> ease the move from the old system to the new one.
> 
> Again, if anyone is interested in helping me out, it would be more than 
> welcome. That way we could ensure that the project gets done in a timely 
> manner and can reduce bugs and think of better solutions (more people 
> can help in thinking out of the box, the larger the group).
> 
> Thanks,
> -Garrett
> 
> PS Please reply on the @hackers list, if possible.
> 
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
Honestly I'd be more interested in a package building system. Maybe be a 
little bit more liberal in the default building of ports. It doesn't 
need to build a package of every port just common ones. That way its 
easier to get up and running with things. Things like xorg, gnome and 
KDE take ages to build and would be awesome if there was a decent 
package fetching system. Something like apt-get where you could add some 
kind of repository. and you could just pull down a list of packages and 
choose what you want. This can be emulated in a way using portupgrade -P 
and changing the pkgtools.conf to have some more mirrors to fetch from a 
pointyhat macro is there but probably shouldn't be abused as its there 
to look for problems not build us consumers packages it just a side 
effect or at least this is how it was explained to me. A neat thing 
might be a distributed package building project. Where packages are 
picked apart and pieces are built all over the place get enough places 
to donate CPU and package building might be a thing of the past, but 
those are just pipe dreams right now.

The slowness affects me after a mass upgrade, after that I'm fine. Maybe 
someone can look into profiling portupgrade and seeing if its with 
portupgrade or the pkg_* tools.