Dump Utility cache efficiency analysis
dillon at apollo.backplane.com
Wed Jun 24 01:07:53 UTC 2009
:This is regarding the dump utility cache efficiency analysis post made on
:February '07 by Peter Jeremy [
:and if this project is still open. I would be interested to begin exploring
:FreeBSD (and contributing) by starting this project.
:I do have some basic understanding of the problem at hand - to determine if
:a unified cache would appeal as a more efficient/elegant solution compared
:to the per-process-cache in the Dump utility implementation. I admit I am
:new to this list and FreeBSD so I wouldn't be able to determine what the
:current implementation is, until I get started.
I think the cache in the dump utility is still the one I worked up
a long time ago. It was a quick and dirty job at the time, and it
was never really designed for parallel operation which is probably
why it doesn't work so well in that regard.
In my opinion, a unified cache would be an excellent improvement.
Ultimately dump is an I/O bound process so I don't think we would
really need to worry about the minor increases in cpu overhead
from the additional locking needed.
There are a few issues you will have to consider:
* Dump uses a fork model for its children rather then pthreads. You
would either have to use the F_*LK fcntl() operations or use a
simpler flock() scheme to lock across the children. Alternatively
you could change dump over to a pthreads model and use pthreads
mutexes, but that would entail a lot more work. Dump was never
designed to be threaded.
* The general issue with any caching scheme for dump is how much to
actually cache per I/O vs the size of the cache. Caching larger
amounts of data hits diminishing returns as it also increases seek
times and waste (cached data never usde). Caching smaller amounts
of data hits diminishing returns as it causes the disk to seek more.
Disk drives generally do have a track cache, but they also only typically
have 8-16M of cache ram (32M in newer drives, particularly the higher
capacity ones). A track is typically about 1-2M (maybe higher now) so
it doesn't take much seeking for the drive to blow out its internal
track cache. Caching that much data in a single read would probably
be detrimental anyway.
This also means you do not necessarily want to cache too much
linearly-read data, as the disk drive is already doing it for you.
Because of all of this it is going to be tough to find cache parameters
that work well generally, and the parameters are going to chance
drastically based on the amount of cache you specify on the command
line and the size of the partition being dumped.
More information about the freebsd-hackers