Turn off RAID read and write caching with ZFS?

Thu May 22 13:37:06 UTC 2014

Am 22.05.2014 14:52, schrieb Karl Denninger:
[...]
> Modern drives typically try to compensate for their
> variable-geometryness through their own read-ahead cache, but the exact
> details of their algorithm are typically not exposed.
> 
> What I would love to find is a "buffered" controller that recognizes all
> of this and works as follows:
> 
> 1. Writes, when committed, are committed and no return is made until
> storage has written the data and claims it's on the disk.  If the
> sector(s) written are in the buffer memory (from a previous read in 2
> below) then the write physically alters both the disk AND the buffer.
> 
> 2. Reads are always one full track in size and go into the buffer memory
> on a LRU basis.  A read for a sector already in the buffer memory
> results in no physical I/O taking place.  The controller does not store
> sectors per-se in the buffer, it stores tracks.  This requires that the
> adapter be able to discern the *actual* underlying geometry of the drive
> so it knows where track boundaries are.  Yes, I know drive caches
> themselves try to do this, but how well do they manage?  Evidence
> suggests that it's not particularly effective.

In the old times, controllers implemented read-ahead, either under
control of the host-adapter or the host OS (e.g. the based on the
detection of sequential access patterns).

This changed, when large on-drive caches became practical. Drives
now do aggressive read-ahead caching, but without the penalty this
had, in the old times. I do not know, whether this applies to all
current drives, but since it is old technology, I assume so:

The sector layout is reversed on each track - higher numbered
sectors come first. The drive starts reading data into its cache
as soon as the head receives stable data and it stops only when
the whole requested range of sectors has been read.

E.g. if you request sectors 10 to 20, the drive may have the read
head positioned when sector 30 comes along. Starting at that sector,
data is read from sectors 30, 29, ..., 10 and stored in the drive's
cache. Only after sector 10 has been read, data is transferred to
the requesting host adapter, while the drive seeks to the next
track to operate on. This scheme offers opportunistic read-ahead,
which does not increase the random access seek times.

The old method required the head to stay on the track for some
milliseconds to read sectors following the requested block on the
vague chance, that this data might later be requested.

The new method just starts reading as soon as there is data under
the read head. This needs more cache on the drive, but does not add
latency for read-ahead. The disadvantage is, that you never know
how much read-ahead there will be, it depends on the rotational
position of the disk when the seek ends. And if the first sector
read from the track is in the middle of the requested range, the
drive needs to read the whole track to fulfil the request, but
that would happen with equal probability with the old sector
layout as well.

> Without this read cache is a crapshoot that gets difficult to tune and
> is very workload-dependent in terms of what delivers best performance. 
> All you can do is tune (if you're able with a given controller) and test.

The read-ahead of reverse sectors as described above does not have
any negative side-effect. On average, you'll read half a track into
the drive's cache whenever you request a single sector.

A controller that implements read-ahead does this by increasing the
amount of data requested from the drive. This leads to a higher
probability that a full track must be read to satisfy the request
and will thus increase latencies observed by the application.

Rergards, STefan