Linus Torvalds on FreeBSD's Use of Copy-on-write
mckusick at chez.mckusick.com
Mon Apr 24 06:33:12 UTC 2006
Anyone working on zero-copy sockets care to respond to this?
Linus Torvalds made reference to some possible future extensions.
This included vmsplice(), a system call since implemented by Jens
Axboe "to basically do a 'write to the buffer', but using the
reference counting and VM traversal to actually fill the buffer."
Reviewing the implications of using such a system call lead to a
comparison with FreeBSD's ZERO_COPY_SOCKET which uses COW (copy on
Linus explained that while this may look good on specific benchmarks,
it actually introduces extra overhead, "the thing is, the cost of
marking things COW is not just the cost of the initial page table
invalidate: it's also the cost of the fault eventually when you
_do_ write to the page, even if at that point you decide that the
page is no longer shared, and the fault can just mark the page
writable again." He went on to explain, "The COW approach does
generate some really nice benchmark numbers, because the way you
benchmark this thing is that you never actually write to the user
page in the first place, so you end up having a nice benchmark loop
that has to do the TLB invalidate just the _first_ time, and never
has to do any work ever again later on." Linus didn't pull any
punches when he summarized:
"I claim that Mach people (and apparently FreeBSD) are incompetent
idiots. Playing games with VM is bad. memory copies are _also_
bad, but quite frankly, memory copies often have _less_ downside
than VM games, and bigger caches will only continue to drive
that point home."
More information about the freebsd-arch