fsync: Linux vs FreeBSD

Ivan Voras ivoras at freebsd.org
Tue Oct 26 23:26:09 UTC 2010


On 10/26/10 21:17, Chuck Swiger wrote:
> On Oct 26, 2010, at 11:33 AM, Marc G. Fournier wrote:
>> Someone recently posted on one of the PostgreSQL Blogs concerning fsync on Linux/Windows/Mac OS X, but failed to make any comments on any of the BSDs ... the post has to do with how fsync works on the various OSs, and am curious as to whether or not this is something that also afflicts us:
>>
>> http://rhaas.blogspot.com/2010/10/wal-reliability.html
>>
>>>  From reading our man page, I see no warnings similar to what the other OSs
>> have, specifically:
>>
>> Mac OS X: For applications that require tighter guarantees about the
>>           integrity of their data, Mac OS X provides the F_FULLFSYNC fcntl
>>
>> Linux: If the underlying hard disk has write caching enabled, then the
>>        data may not really be on permanent storage when fsync() /
>>        fdatasync() return.
>>
>> So, do we hide the fact, or are, in fact, not afflicted by this?
>
>
> Whether the data actually gets written and the on-disk cache itself flushed seems to depend on a sysctl called hw.ata.wc for FreeBSD or the dkctl setting in NetBSD; write-caching seems to always default to on because otherwise people scream bloody murder about the factor of ten reduction in write performance with it off.  Further, by default (ie, FFSv2 with soft updates), data changes are synced out when you do an fsync(), but metadata changes are done asynchronously-- which is exactly what MacOS X does.
>
> In other words, if you have write-caching on, no effort is made to invoke ATA_FLUSHCACHE or SCSI "SYNCHRONIZE CACHE" to make sure that your disk has actually written the bits to permanent storage.

To clarify: all this is in case write-caching happens on disk drives or 
on disk controllers.

The common way to deploy servers for a long time now is to have a disk 
controller with RAID capabilities and its own RAM cache which is backed 
by a battery or a capacitor. This controller in turn switches on-drive 
write caches off. All of the RAID controllers I've seen have a toggle 
for this last part (on-drive write caches) and it was always turned off 
by default (though it doesn't hurt to check).

To emulate this with desktop drives, as cswiger said, hw.ata.wc should 
be turned off, with the expected influence on drive performance.

All this is valid for UFS. ZFS on the other hand *should* use BIO_FLUSH 
where appropriate, so it should be safer with desktop drives. OTOH ZFS 
is so complex that it's hard to say if an error occurs what has caused it.




More information about the freebsd-questions mailing list