Linus Torvalds on FreeBSD's Use of Copy-on-write

Wed Apr 26 22:18:56 UTC 2006

On Mon, Apr 24, 2006 at 19:24:14 -0400, Alexander Kabaev wrote:
> On Sun, 23 Apr 2006 23:55:23 -0700
> Julian Elischer <julian at elischer.org> wrote:
> 
> > John-Mark Gurney wrote:
> > 
> > >Kirk McKusick wrote this message on Sun, Apr 23, 2006 at 23:33 -0700:
> > >  
> > >
> > >>Linus explained that while this may look good on specific
> > >>benchmarks, it actually introduces extra overhead, "the thing is,
> > >>the cost of
> > >>    
> > >>
> > >
> > >Has he benchmarked this to prove his point?  And has he done it over
> > >realworld work loads, like Apache or another "standard" program
> > >instead of a microbenchmark designed especially to make COW look bad?
> > >  
> > >
> > 
> > Well no-one has even confirmed that freeBSD does all this
> > "page flipping" etc.  I doubt that Linus has looked inside the BSD
> > kernels. He's probably just repeating what he's been told, and that's
> > so accurate, right?
> > 
> > I know that freeBSD developers have over the last few years also 
> > acknowledged that
> > the speed  of modern CPUs vs. memeory speeds makes it often less
> > efficient to do certain optimisations than it used to be.
> > 
> > 
> > 
> > 
> > >As w/ all theories, w/o numbers, they are only theories till backed
> > >up w/ benchmarks.
> > >
> > >This isn't suppose to defend COW, but it is designed to ensure that
> > >people don't stop exploring just because someone says something...
> > >
> > >  
> > >
> > _______________________________________________
> > freebsd-arch at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-arch
> > To unsubscribe, send any mail to
> > "freebsd-arch-unsubscribe at freebsd.org"
> The original zero-copy code was used to stream network data directly to
> an onboard SDRAM memory on an extension PCI card. Linus' notes about
> relative cost of memory-to-memory copies vs. TLB shootdowns and possible
> page access traps had little relevance there.

If you're talking about the code I wrote at Pluto/Avid, yeah, it was
primarily to get around very poor CPU access speed to the external RAID
cache we had on that box.

The zero copy send model used async I/O semantics -- you'd get the
completed buffers back when they were freed, so you knew you could reuse
them in userland.

The zero copy receive model relied on extensive modifications to the Alteon
Tigon II firmware -- a whole lot more than the header splitting firmware
that's in the FreeBSD tree now.  I added 16 extra receive rings to the
firmware to handle up to 16 incoming FTP connections.  Packet headers were
DMAed into normal kernel memory, and the payloads were DMAed into the PCI
attached RAID cache.

Since neither implementation was generally useful, Drew Gallatin and I
cleaned up his zero copy sockets implementation for inclusion in FreeBSD.
The receive semantics with respect to userland were similar to what I did
at Pluto/Avid, but the send semantics are somewhat different, since they
rely on COW instead of a separate notification that the buffer is done.

Is any of that old codebase still in use?  I figured it probably went away
with the old hardware...

Ken
-- 
Kenneth Merry
ken at FreeBSD.ORG