Splitting up sets of files for archiving (e.g. to tape, optical media)
freebsd-lists at gromit.dlib.vt.edu
Fri Jan 19 14:44:16 UTC 2018
> On Jan 19, 2018, at 2:59 AM, freebsd-questions-request at freebsd.org wrote:
> Date: Thu, 18 Jan 2018 18:06:58 -0800
> From: "Ronald F. Guilmette" <rfg at tristatelogic.com>
> To: freebsd-questions at freebsd.org
> Subject: Splitting up sets of files for archiving (e.g. to tape,
> optical media)
> Message-ID: <68377.1516327618 at segfault.tristatelogic.com>
> This isn't really FreeBSD specific, but in my experience the folks on
> this list have a lot of knowledge about a lot nice, useful free software
> tools, so I hope nobody will begrudgd me for asking this question here.
> I'm looking for a pre-existing software tool, which may or may not already
> exist, and which will do the following job...
> Problem statement:
> Imagine that you have a big set of files that you would like to archive
> to some sort of archiving media, such as tapes, or optical media, where
> each unit of said archiving media has a capacity considerably less than
> the total aggregate size of all of the files you want to archive.
> Imagine further that you would like your set of input files to be spread
> across the units of the output (archive) media such that no single input
> file is ever split across more than one unit of the output media, in order
> to simplify recovery/restore of individual files.
> Lastly, assume that it is desired to minimize, as much as reasonably
> possible, the total number of output (archive) media units used to
> archive the entire set of input files. (And to further this goal,
> it is acceptable for files from any single input subdirectory to be
> scattered among the various output media units.
> In my case, I want to archive several hundred gigabytes onto a set of
> blank BD-R disks.
> I plan to use ImgBurn to actually write the BD-R disks.
> So basically, I just need a tool to analyze the input file set, applying
> some sort of bin packing algorithm, and then spit out a list of which
> specific files should go into each specific archive volume, e.g. #01, #02,
> #03... etc. Each such set of files will then, in turn, be hard-linked
> into a temporary directory, and then, one by one, ImghBurn will be told
> to write each of these temp directories to a single output BD-R disk.
> I have written a small software tool to do the above "splitting" job,
> and I am currently improving upon it, but it occured to me that I
> should at least ask if someone else has perhaps already perfected this
> exact wheel that I am busy re-inventing.
> P.S. It seems unlikely that I'm the first and only person to have ever
> written a tool to do this specific job, but on the off chance that I am,
> I am more than willing to contribute my little tool to the ever-expanding
> ports tree.
Have you looked at fpart (https://github.com/martymac/fpart)? It looks to me like it is applicable to your problem (which sounds to me like a variant of the bin packing problem). The fpart README even lists packing music files onto fixed-size DVD media as one of its examples, which sounds close to the archiving scenario you give above. Plus, it claims to be developed on FreeBSD.
Disclaimer: I have not used fpart myself.
More information about the freebsd-questions