fsync and latest PostgreSQL

Fri Feb 15 12:41:51 UTC 2019

On Fri, Feb 15, 2019 at 01:09:08PM +0100, Palle Girgensohn wrote:
> Hi!
> 
> I'm packaging postgresql ports for FreeBSD. I need your advice about a change to the PostgreSQL backend that seems to be aimed at working around a problem in Linux where the OS "lies" about fsync.
> 
> There's a description here [1]:
> 
> 
> > data_sync_retry (boolean)
> > 
> > When set to false, which is the default, PostgreSQL will raise a PANIC-level error on failure to flush modified data files to the filesystem. This causes the database server to crash.
> > 
> > On some operating systems, the status of data in the kernel's page cache is unknown after a write-back failure. In some cases it might have been entirely forgotten, making it unsafe to retry; the second attempt may be reported as successful, when in fact the data has been lost. In these circumstances, the only way to avoid data loss is to recover from the WAL after any failure is reported, preferably after investigating the root cause of the failure and replacing any faulty hardware.
> > 
> > If set to true, PostgreSQL will instead report an error but continue to run so that the data flushing operation can be retried in a later checkpoint. Only set it to true after investigating the operating system's treatment of buffered data in case of write-back failure.
> 
> 
> 
> An email by the committer [2] indicates that it is safe to set data_sync_retry = true for "all file systems on FreeBSD" but makes not recommendations:
> 
> 
> > I personally believe it is safe to run with data_sync_retry = on on
> > any file system on FreeBSD, and ZFS on any operating system... but I
> > see no need to make recommendations about that in the documentation,
> > other than that you should investigate the behaviour of your operating
> > system if you really want to turn it on.
> 
> 
> I'm pondering about setting this knob to default true in the FreeBSD ports. Any thoughts or comments about that?
> 
> Cheers,
> Palle
> 
> 
> 
> [1] https://www.postgresql.org/docs/11/runtime-config-error-handling.html#GUC-DATA-SYNC-RETRY <https://www.postgresql.org/docs/11/runtime-config-error-handling.html#GUC-DATA-SYNC-RETRY>
> 
> [2] https://www.postgresql.org/message-id/CAEepm%3D16aauN3LMHrVZ-uoqU8-k7aoSdGC3t7PghewVVsjUwtQ%40mail.gmail.com <https://www.postgresql.org/message-id/CAEepm=16aauN3LMHrVZ-uoqU8-k7aoSdGC3t7PghewVVsjUwtQ@mail.gmail.com>
> 

At least for UFS, fsync(2) and fdatasync(2) wait for the write to finish
and do not throw away dirty buffers which happens to get a write error.
We are also careful to re-dirty such buffers when async write fails with
any error except ENXIO.  So the error from fsync(2) does not invalidate
non-written data, and next fsync(2) call would retry the write.

Practically this means that the dirty buffers for the device with the
failing writes are accumulated in the system.

In principle, this is also true for filesystems that correctly use
buffer cache, e.g. msdosfs.  So it might be relevant for other writeable
filesystems, but I did not looked.

I cannot comment about ZFS.