Re: Modular fetch design proposal: Was: [HEADSUP] Deprecation of the ftp support in pkg

From: Baptiste Daroussin <>
Date: Mon, 24 Jan 2022 15:26:18 UTC
On Mon, Jan 24, 2022 at 03:46:39PM +0100, Jan Bramkamp wrote:
> On 24.01.22 09:12, Chris wrote:
> > On 2022-01-23 10:19, Patrick M. Hausen wrote:
> > > Hi all,
> > > 
> > > I did not really have an opinion on this, since we never used FTP,
> > > but I was a bit surprised by the suggestion to use SSH instead.
> > > 
> > > It never occurred to us that anything but HTTP(S) was possible.
> > > We simply run Nginx in a jail serving the packages that Poudriere
> > > produces for us. Setup time/effort: 5 minutes.
> > > 
> > > Now after this comment:
> > > 
> > > > Am 22.01.2022 um 09:35 schrieb Chris <>:
> > > > I find it's less "housekeeping" to use ftp(1) setup through
> > > > inetd(8) for pkg repos, than
> > > > via ssh.
> > > 
> > > I understand the appeal of FTP.
> > > Maybe this discussion is focusing on the wrong topic. Perhaps
> > > we should consider including a light weight way to serve HTTP(S)
> > > in base? Like Lighttpd, which as far as I know comes with a BSD
> > > 3-clause equivalent license.
> > > 
> > > But then the general tendency has been to remove network services
> > > from base rather than introduce them. Like e.g. BIND.
> > > 
> > > So I really have no idea what the general opinion is, just wanted
> > > to throw in that IMHO HTTPS is the best protocol to the task and
> > > if some way to serve that could be included in base, I for one would
> > > appreciate that.
> > > 
> > > OTOH Chris, what's keeping you from installing a web server just
> > > serving static files?
> > Different environments/ different requirements. But habit as much as
> > anything else.
> > Ftp is trivial, has always been available. So I never even need to think
> > about it.
> > I perform mass installs/upgrades in large networks. There is no overhead
> > using ftp
> > either through a one-start | inetd. The clients are all started/used at
> > will.
> > It seems to me that removing features also removes value. IMHO the gain
> > from the
> > removal of transports as trivial as ftp(1) bring little to the table for
> > all
> > concerned. But that's just me. :-)
> Have you ever looked into a FTP protocol parser and what's required to get
> different FTP configurations through the NAT infested networks of today? FTP
> is an ugly protocol from the beginning of time that should have been put
> down decades ago. Even without pipelining HTTP saves several network round
> trips and poudriere already generates HTML and JSON status updates during
> builds as read only web ui.
> This thread has shown that users have deployed complex, fragile workarounds
> the limited protocol selection offered by pkg. I recommend adding a clean
> and official extension interface spawning fetch helper processes from a well
> known location outside of $PATH derived from the URI schema (e.g.
> ${PREFIX}/libexec/pkg/fetch-${SCHEMA}). To keep helpers simple and small
> they would be started in an execution environment (working directory,
> environment variables, minimal set of inherited file descriptors) to be
> prepared by pkg expecting the repository URI as first (and only?) argument.
> Reading a stream of pairs of file name (e.g. the package hash stored in the
> repository) and relative path per line to fetch from standard input into the
> inherited working directory allowing users to add their own transport
> helpers similar to git.
> To support progress updates and allow pkg to start the installation of fetch
> packages as soon as possible helpers could write lines with
> "${BYTES_FETCHED} ${BYTES_TOTAL} ${FILE}" to standard output periodically. A
> (permanent) transfer failure could be encoded by a negative $BYTES_FETCHED
> and a successfully completed transfer as $BYTES_FETCHED == $BYTES_TOTAL. If
> the helper doesn't know the file size it should be allowed to use negative
> $BYTES_TOTAL values in all but the last progress update (per fetched file).
> All transfers not reported as successfully completed or permanently failed
> are implicitly confirmed by exiting with EX_OK. Other exit codes implicitly
> fail all unconfirmed transfers. Pkg should clean up the working directory
> after the the helper has exited to delete partially transferred files (and
> anything else the helper may have left taking care not to follow symlinks).
> Pkg should apply resource limits and drop privileges (when running as root)
> before exec()ing into the helper. Well written helpers can use capsicum to
> provide further defense in depth.
> The package repository already contains the the expected package sizes. As
> an optimization for dealing with out of sync mirrors the known file sizes
> can be matched against positive file sizes reported by helpers to fail
> quickly.
> Refactoring all supported protocols to use this interface would reduce the
> complexity of pkg itself.
> This design can be further extended with more features (and potential for
> bugs) until we end up with something similar to the git annex external
> special remote protocol
> (
> if there are enough relevant use cases justifying the additional complexity
> in pkg and its file transfer helpers.

YEs I have something like that in mind, the actually I refactored slowly the
fetch code to become pluggable, which lead me to the deprecation of the ftp
protocol. and once the code here is clean enough we could imagine adding
external fetchers.

Best regards,