cvs commit: src/sys/i386/isa prof_machdep.c src/sys/amd64/amd64 prof_machdep.c

From: Bruce Evans <bde_at_FreeBSD.org>
Date: Thu, 29 Nov 2007 02:01:21 +0000 (UTC)
bde         2007-11-29 02:01:21 UTC

  FreeBSD src repository

  Modified files:
    sys/i386/isa         prof_machdep.c 
    sys/amd64/amd64      prof_machdep.c 
  Log:
  Don't use plain "ret" instructions at targets of jump instructions,
  since the branch caches on at least Athlon XP through Athlon 64 CPU's
  don't understand such instructions and guarantee a cache miss taking
  at least 10 cycles.  Use the documented workaround "ret $0" instead
  ("nop; ret" also works, but "ret $0" is probably faster on old CPUs).
  
  Normal code (even asm code) doesn't branch to "ret", since there is
  usually some cleanup to do, but the __mcount, .mcount and .mexitcount
  entry points were optimized too well to have the minimum number of
  instructions (3 instructions each if profiling is not enabled) and
  they did this.  I didn't see a significant number of cache misses for
  .mexitcount, but for the shared "ret" for __mcount and .mcount I
  observed cache misses costing 26 cycles each.  For a send(2) syscall
  that makes about 70 function calls, the cost of these cache misses
  alone increased the syscall time from about 4000 cycles to about 7000
  cycles.  4000 is for a profiling (GUPROF) kernel with profiling disabled;
  after this fix, configuring profiling only costs about 600 cycles in the
  4000, which is consistent with almost perfect branch prediction in the
  mcounting calls.
  
  Revision  Changes    Path
  1.31      +2 -2      src/sys/amd64/amd64/prof_machdep.c
  1.32      +2 -2      src/sys/i386/isa/prof_machdep.c
Received on Thu Nov 29 2007 - 02:01:21 UTC