Reason for doing malloc / bzero over calloc (performance)?
Matthew Dillon
dillon at apollo.backplane.com
Fri Jun 15 01:04:34 UTC 2007
I'm going to throw a wrench in the works, because it all gets turned
around the moment you find yourself in a SMP environment where several
threads are running on different cpus at the same time, using the
same shared VM space.
The moment you have a situation like that where you are futzing with
the page tables, i.e. using mmap() for demand-zero and munmap() to
free, the operation becomes extremely expensive verses anything
else because any update to the page table (specifically any removal
of page table entries from the page table) requires a SMP synchronization
to occur between all the cpu's actively sharing that VM space, and
that's on top of the overhead of taking the page fault(s).
This is true of any memory mapping the kernel has to do in kernel
virtual memory (must be synchronized with ALL cpus) and any mapping
the kernel does on behalf of userland for user memory (must be
synchronized with any cpu's actively using that VM space, i.e. threaded
user programs). The synchronization is required to properly invalidate
stale mappings on other cpus and it must be done synchronously due
to bugs in Intel/AMD related to changing page table entries on one
cpu when instructions are executing using that memory on another cpu.
There is no way to avoid it without tripping up on the Intel/AMD hardware
bugs.
From this point of view it is much, much better to bzero() memory that
is already mapped then it is to map/unmap new memory. I recently
audited DragonFly and found an insane number of IPIs flying about due
to PAGE_SIZE'd kernel mallocs using the VM trick via kernel_map &
kmem_alloc(). They all went away when I made the kernel malloc use
the slab cache for allocations up to and including PAGE_SIZE*2 bytes.
Fun, eh?
-Matt
Matthew Dillon
<dillon at backplane.com>
More information about the freebsd-hackers
mailing list