freebsd-performance Digest, Vol 3, Issue 1

Terry Lambert tlambert2 at mindspring.com
Mon May 5 19:57:42 PDT 2003


Artem Tepponen wrote:
> > Too bad it's not supported, and too bad that, if it was, the
> > overhead would be too high because there's not VOP to get the
> > FS block offsets, so you would have to go trouh the FS code to
> > swap, and it would be much, much slower.
> 
> Btw, do you have any fresh numbers on hand that can support this statement?
> Naive approach whould be comparing CPU time taken and disk latencies
> that differ by an order of magnitude and conclude that few microseconds
> eaten by CPU would go unnoticed compared with milliseconds taken by disk.

The FS orders operations; raw disk I/O does not.  The FS lays
out blocks in files essentially at random; the layout of the
blocks in the swap partition is linear.  The FS must obey POSIX
semantics about access and modification times; raw disk I/O
does not.  The FS enforces read-before-write on non-page aligned
whole page access.

We aren't talking about CPU time here, we are talking about
operational delay overhead, seek overhead, and a doubling of
the addition of a write operation per read or write access to
the file, etc..

You can't tell me that twice the I/O... potentially twice the
I/O... is a CPU issue.

Even with the optimization I suggested, of getting a physical
block list, and using that against the raw device (essentially
the same pig-trick that the FreeBSD NTFS uses to rewrite NTFS
files contents, so long as the size never changes), there's
still an additional indirection through a blocklist to convert
a physically discontiguous block array into a logically contiguous
one, and there's still the fact that it has to seek all over the
disk to access those blocks, and it can't use bulk transfer in
the driver or predictive read-ahead in the VM system.

Add to this that you can't dump to a swap device created this way:
at crash time, you cannot risk extending the file, so it would
have to be pre-allocated large enough, and you could not trust
the block conversion list was not corrupted by whatever caused
the panic, and where you write is not limited by a simple set of
block offsets for a region of the disk which is guaranteed to not
contain boot-critical or recovery-critical data...

...and you have an overwhelming set of performance limitations
not related to CPU utilization.

-- Terry


More information about the freebsd-performance mailing list