package distribution crisis - CDN needed

João Carlos Mendes Luís jonny at jonny.eng.br
Tue Apr 8 00:14:11 UTC 2008


Pav Lucistnik wrote:
> Okay the situation recently was that the mirrors had no chance keeping
> up with all the package sets I've been uploading to ftp-master.
>
> We clearly need to move beyond rsync/cvsup synced ftp mirrors. This does
> not scale.
>
> I do propose a creation of a CDN (Content Delivery Network), having
> these features:
>
> - no mirroring of a complete package set! (Also no directory listings.)
>   When client requests the file, and the file is not in the local cache,
>   the file is downloaded from the upstream server and while it's being
>   obtained, it's already being sent to the client. This is basically
>   squid.
>
> - if the file is present in the local cache, it's returned from local
>   cache.
>
> - local cache is invalidated when a new package set is available on
>   an upstream server. Invalidating mechanism:
>   option a) cronjob that polls upstream server every 5 minutes for a
>             file that gives current package set IDs (pull)
>   option b) master server sends notification to all mirrors to
>             invalidate a package set (push)
>   optimization: when package set was invalidated, don't delete old
>   files, instead on next hit, verify timestamp against upstream server
>
> - atomic package set uploads to master from pointyhat (probably having
>   two directories that are switched over on master)
>
> - everything runs over http
>
> - default source of files for "pkg_add -r" command
>
> The goal is to refresh a package set on a daily basis.
>
>
> I don't know if we can use some existing software for this (Squid?
> Apache mod_proxy?) or if we will need to put something new together.
> Ideas?
>   
I am not sure if this would solve anything, but if we go further in this 
direction, I'd like to see some architecture with prefetch capability.

Note also that a real CDN would hide from the final user the real data 
location, and this would be selected by some sort of proximity and/or 
load information.  Some CDNs indeed use proxy cache to central server as 
means of populating its own data, but proxy caching is only a small part 
of the solution.

I did not follow whatever situation happened recently, but I had some 
trouble in the past with late announcements for mirror administrators.  
I had sometimes received the announce just like any other FreeBSD user.  
And even in that cases, packages were distributed much time earlier than 
final release.

                                        Jonny

-- 
João Carlos Mendes Luís - Networking Engineer - jonny at jonny.eng.br



More information about the freebsd-hubs mailing list