RFC: should the copy_file_range() syscall be atomic?

Rick Macklem rmacklem at uoguelph.ca
Thu Jun 13 21:44:05 UTC 2019


When I first wrote the copy_file_range() syscall, the updating of the file was atomic,
because I locked both vnodes while doing it.
kib@ quickly pointed out that this could introduce a LOR/deadlock because two
userland threads could concurrently do the syscall with the file arguments reversed.

It turns out that locking two VREG vnodes concurrently to do this isn't easy and
would require the implementation of non-blocking versions of:
  vn_rangelock_rlock() and vn_rangelock_wlock()
  - I am not sure how difficult doing this is, but I'll admit I'd rather not do this.

Also, having the vnodes locked precludes use of VOP_IOCTL(..FIOSEEKDATA/
FIOSEEKHOLE..) calls to find holes in the byte range of the file being copied from.

Without the vnodes locked, it is possible for other threads to write to either of the
files concurrently with the copy_file_range(), resulting in an indeterminate results.
(cp(1) has never guaranteed atomic copying of files, so is it needed in this syscall?)
In summary, doing the syscall non-atomically has the advantages of:
- vn_rdwr() takes care of the vnode locking issues, so any changes w.r.t. locking
  wouldn't require changes to this syscall's code.
- VOP_IOCTL() could be used to find holes.
- No new rangelock primitives need to be added to the system.
- If there were some combination of file system/storage unit where copying
  non-overlapping byte ranges concurrently could result in better performance,
  then that could be done. (An atomic syscall would serialize them.)

The advantage of an atomic syscall would be consistency with write(2) and read(2)
behaviour.

The following comments are copied from phabricator:
kib@ - So you relock range for each chunk ? This defeats the purpose of the range locking. Should copy_file_range() be atomic WRT other reads and writes ?

asomers@ - That has an unfortunate side effect: copy_file_range is no longer atomic if you drop the vnode locks and range locks in the middle. It would be possible for two copy_file_range operations to proceed concurrently, leaving the file in a state where each of the operations was partially successful. A better solution would be to add rangelock_trywlock and rangelock_tryrlock. I don't think it would be very hard (testing them, however, could be).

I don't see anything in the Linux man page w.r.t. atomicity, so I am now asking what
others think?
(I'll admit I'm biased towards non-atomic, since I have already coded it and can use
 the VOP_IOCTL() calls to find the holes in the input file, but...)

rick


More information about the freebsd-fs mailing list