PATCH: Forcible delaying of UFS (soft)updates

Terry Lambert tlambert2 at mindspring.com
Thu Apr 17 09:55:42 PDT 2003


Marko Zec wrote:
> Ian Dowse wrote:
> > Note that the ATA "delayed write" mechanism only delays writes while
> > the disk is spun down; at other times there is no change in behaviour.
> > Since the disk only spins down after it has been idle for a time,
> > it is very unlikely that the disk is left in an inconsistent state
> > while it is stopped.

I'm wondering if the ATA "delayed write" actually does this, or if
it merely relaxes the cache restrictions, without retaining the
ordering enforcement.

I suspect that it does not retain the ordering enforcement, as
there is no way to disconnect on a tagged queue write, because
you must issue a request for status, and it can't be done as a
seperate ATA operation (see the posts by the Maxtor employee, on
and around January 20th of this year to the -FS list for details).

You are much better off accumulating requests in the kernel in
buffers, and then using the normal write mechanism to push them
out to the drive ordered (IMO).  This implies a barrier and new
code above the bwrite interface, to keep the buffers from getting
locked, and stalling you applications in user space.

A problem I see here is that swap is on a totally different path,
and in a different area of the disk (practically guaranteeing a
seek, and a track buffer invalidation on the disk), even if you
could cause swapping to be delayed (I don't think you can; FreeBSD
aggressively uses memory, and so when you need to swap, you *need*
to swap).


> The OS _does_ know (approximately) when the disk is spinning and when not.
> For example, if the disk is configured to stop spinning immediately after
> the last I/O operation, the OS can safely assume 10 or more seconds
> afterwards the spinning will be stopped. The OS only has to keep record (in
> form of timestamp or something similar) when it has issued the last I/O
> request to the disk. In my patch this is accomplished using the stratcalls
> marker, which is increased every time the strategy routine of the ATA disk
> driver is invoked. Therefore the OS can also successfully coalesce the
> pending disk updates with other outstanding I/O disk operations, which are
> typically reads of uncached sectors or VM swapping.

This is useful, but not enough.  You need to actually communicate
the information above the block I/O layer, to the soft updates.  I
think, effectively, what you actually want to do is to stop the
soft updates clock, rather than trying to play stupid disk tricks
with timers, etc., above and beyond what you have to do.  I can see
it being useful on SCSI disks, as well, particularly where there are
temperature issues.  Though in that case, you probably are more
memory starved than anything, and it will end up doing you no good.

> I agree the ATA delayed writes is a great functionality that can help save
> battery power.

I don't; only if the write order is maintained is it "great".

> I just want to point out that it can suffer from the same
> consistency problems as the model of OS controlled delayed synching combined
> with null fsync() processing. However, if the OS controls the delaying of
> updates, you can turn on or off normal fsync() semantics as desired. With
> delaying writes in ATA firmware, you simply do not have the choice :)

I think people are confusing fsync() with syncd at this point.  8-(.

-- Terry


More information about the freebsd-fs mailing list