Is a successful call to write(2) atomic?

Tue Jun 15 21:28:12 UTC 2021

In message <44e15917-0c92-08f2-462e-a1b3705f9afb at panix.com>, 
Kurt Hackenberg <kh at panix.com> wrote:

>This is an old, general problem: concurrent access to a shared resource. 

I guess.  For the moment I personally prefer to characterize it as an
"atomicity" problem.

(DIGRESSION:  My vague recollection is that certain of the message passing
primitives... most probably the old System V Version 4 ones, which are the
only ones I ever actually used... *do* provide some guarrantees that
individual messages will be treated as indivisible units, and never
broken up into parts.)

>There are two common solutions. Paul suggested one of them: serialize 
>access through a single process.

I stated in response that I was 100% sure that that would solve the problem.
Upon further reflection however, I wish to withdraw that assertion.

In fact, it now appear to me that the notion of using a pipe to pass
(indivisible?) lines of data up to a parent/controller process from a
number of child processes, and then allowing that parent to "sequentialize"
those data lines onto disk actually just moves the problem around, without
actually addressing it.

Consider this:  If single ("successful") call to write() (which returns a
value indicating that all bytes were written) fails to guarantee that
the entire written buffer will be treated as an indivisible unit, then
it likely fails to provide that fundamental guarantee *regardless* of
whether the specific file descriptor being written to is associated
with either (a) a file on the local hard disk or (b) a pipe being used
to communicate with a another process.

Thus, even if I were to implenment the master/slave approach... which I
am familiar with and which I have implemented previously in other contexts...
it seems to me that the absence of an iron-clad guarantee of "atomicity"
with repsect to the data written in any single call to write() is still
a potential problem.

Am I making sense?

>The other is to serialize it through some kind of lock...

Yes, but it isn't clear to me that even this would address the problem.

If the kernel may, at its sole discretion, take a single line of output
data that I pass to it in a single call to write() and if it can break
that single line into different sub-parts and then *physically* write
those parts at different times of its own pleasure/choosing, then even
if I make use of a "sequentializing" lock on the output file, it seems
to me that the kernel might still be free to write the first half of
any given line to the physical output file while the lock is in place,
and then it might be allowed to write the second half of the line to the
physical medium sometime *after* the lock is released.  Result?  Garbled
output lines in the physical file... like what I am seeing already.

It all comes back to the issue of the "atomicity" of single (successful)
calls to write().  If that is indeed not a guaranteed aspect of the
semantics of write() then I'm not sure that there is any actual solution
to the problem which would be "guaranteed" to work.

Sigh.  Given this context, for the moment I think that I'm going to
go back to assuming what I had always previously assumed, i.e. that
single successful calls to write() do indeed treat the written data
as an indivisible unit, and that actually, the garbled output I am
seeing in my output files is most likely just a product of my sloppy
programming and some misdirected pointer my code that got away from
me somehow.

Occam's razor says that this is really still the most likely explanation,
and that I should probably be looking harder for mistakes in my code before
I go around questioning the permanence of the heavenly constellations, or
other things that, in practice, have always previously seemed to me to be
axiomatic features of the Universe, such a gravity and the atomicity of
write()'s.

Regards,
rfg