cvs commit: src/lib/libc/stdio _flock_stub.c local.h

Bruce Evans bde at zeta.org.au
Wed Mar 10 10:20:10 PST 2004


On Thu, 11 Mar 2004, Tim Robbins wrote:

> On Tue, Mar 09, 2004 at 07:59:12PM -0800, Alfred Perlstein wrote:
>
> > * Bruce Evans <bde at zeta.org.au> [040309 07:50] wrote:
> > > This would pessimize even getc_unlocked() and putc_unlocked().  getc()
> > > and putc() are now extern functions, but the old macro/inline versions
> > > are still available as getc_unlocked() and putc_unlocked().  Simple
> > > benchmarks for reading a 100MB file on an Athlon XP1600 overclocked
> > > show that the function versions are up to 9 times slower:
> > > ...
> >
> > Hmm, can't we use macros that do this:
> >
> > #define getc()	(__isthreaded ? old_unlocked_code : getc_unlocked())
> >
> > Where __isthreaded is a global that's set by threading libraries
> > to 1 and 0 by non-threaded libc, this should get rid of a lot of
> > the function call overhead.
>
> Sounds like a good idea to me. In my testing, this approach was about 5%
> slower than calling getc_unlocked() directly (due to the conditional jump),
> but roughly 3 times faster than a call to the getc() function.

You must not have tested the dynamic linkage case :-).

> If there aren't any objections, I think we should implement getc()/putc()
> this way (and all the other stdio functions that have traditionally had
> macro equivalents) before 5-stable to try to recoup some of the performance
> losses caused by the removal of the macros.

Is __isthreaded always set early enough?  What about if the application is
dynamically linked and loads thread support later (is this supported)?

The 5% cost of checking on every call can be avoided by pushing the check
into a fucntion.  E.g.: for getc():

% #define	__sgetc(p) (--(p)->_r < 0 ? __srget(p) : (int)(*(p)->_p++))

It can be arranged that --(p)->_r < 0 is always true for the threaded case
(by keeping only a flag in it and keeping the real count elsewhere).  The
threaded case would become slightly slower since it would always have the
dummy count check plus a dummy count fixup.  Actually it shouldn't need
the fixup.  The above is hand-optimized for PDP11's and would probably
be no slower on current hardware with current compilers written as:

#define	getc(p) ((p)->_r <= 0 ? __srget(p) : (--(p)->_r, (int)(*(p)->_p++)))

I once wrote a version of stdio that optimized the usual case putc()
similarly by arranging that the write count is always 0 except for the
fully buffered case, so that other cases get handled by a function
(this pessimizes the line buffered case relative to the FreeBSD
putc_unlocked()).

Bruce


More information about the cvs-src mailing list