[Bug 295707] aio_write: O_APPEND write ordering guarantee is not enforced
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Fri, 29 May 2026 19:51:11 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=295707
Bug ID: 295707
Summary: aio_write: O_APPEND write ordering guarantee is not
enforced
Product: Base System
Version: 15.0-STABLE
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: bugs@FreeBSD.org
Reporter: i.maximets@ovn.org
Attachment #271331 text/plain
mime type:
Created attachment 271331
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=271331&action=edit
reproducer
The man page [1] says:
If O_APPEND is set for iocb->aio_fildes, write operations append to the
file in the same order as the calls were made.
[1] https://man.freebsd.org/cgi/man.cgi?query=aio_write
Open vSwitch is using aio for logging, and we see a fairly frequent log
reordering or even interleaving in our tests on FreeBSD. The reason seems
to be that the kernel doesn't actually enforce the ordering for O_APPEND
on the same file descriptor. It appears that kernel workers just pick up
new requests whenever they can and so the writes end up out of order on
systems with more than one core.
AFAICT, POSIX technically has an exemption for the ordering rule for
multiprocessor systems. However, this is not mentioned in the man page and
the spirit of the exemption seems to actually be an exceptional case and not
a general rule for how things should work. And aio in general would not be
very useful if we had to wait for every request to be completed before
submitting a new one.
Attached a relatively simple reproducer program that mimics the usage
pattern we have in OVS. It makes 50K writes with numbered lines with at
most 256 requests in-flight at the same time. A ring buffer is used to
track the requests. On EAGAIN - waits for one request to be done and tries
again. At the end checks the file for the order of the written lines and
the correctness of the written data.
This test always passes on Linux, which has the same ordering claim in
their man page, but different implementation, of course. On FreeBSD the
test fails in our CI with ~25% of rows getting reordered:
$ clang -o aio-append aio-append.c
$ ./aio-append
REORDERED at line 2: expected seq 1, got 2
REORDERED at line 5: expected seq 5, got 1
REORDERED at line 6: expected seq 2, got 5
REORDERED at line 11: expected seq 10, got 11
REORDERED at line 12: expected seq 12, got 10
REORDERED at line 13: expected seq 11, got 12
REORDERED at line 27: expected seq 26, got 27
REORDERED at line 28: expected seq 28, got 29
REORDERED at line 31: expected seq 32, got 26
REORDERED at line 32: expected seq 27, got 28
50000 lines, 13445 reordered, 0 corrupted
While, I guess, that can be fixed by updating the docs while still being
sort of POSIX compliant, would be nice to actually have kernel enforcing
the currently documented behavior.
WDYT?
--
You are receiving this mail because:
You are the assignee for the bug.