svn commit: r183896 - head/sys/dev/bge
Scott Long
scottl at samsco.org
Wed Oct 15 10:08:29 UTC 2008
Bruce Evans wrote:
> On Tue, 14 Oct 2008, Scott Long wrote:
>
>> Marius Strobl wrote:
>>> Author: marius
>>> Date: Tue Oct 14 20:28:42 2008
>>> New Revision: 183896
>>> URL: http://svn.freebsd.org/changeset/base/183896
>>>
>>> Log:
>>> Use bus_{read,write}_4(9) instead of bus_space_{read,write}_4(9)
>>> in order to get rid of the bus space handle and tag in the softc.
>>
>> Has anyone looked at the generated code from this interface switch and
>> compared it what was getting generated previously? Way back when,
>
> I just looked at the source code. This seems to be only a small
> pessimization since it involves just one extra indirection and
> related loss of optimization possibilities: (sc->sc_bt, sc->sc_bh)
> becomes r = sc->sc_br; (r->r_bustag, r->r_bushandle). In theory,
> the compiler could optimize by caching the tag and handle in registers
> in either case so that only 1 extra indirection is needed per function,
> but I've never seen that being done enough to make much difference.
>
> However, some drivers (e.g., ata) were too stupid to cache sc_bt and
> sc_bh, and converting these to use bus_nonspace*() was a relatively large
> optimization: r = sc->sc_br; (rman_get_bustag(r), rman_get_bushandle(r))
> became r = sc->sc_br; (r->r_bustag, r->r_bushandle). Since
> rman_get_bustag()
> and rman_get_bushandle() have never been inline, calling both of them
> on every bus space access gave enormous code space and instruction
> count bloat and corresponding (relatively tiny) runtime bloat. The
> instructions normally run so much faster than the i/o that you can do
> hundreds or thousands of them per i/o before noticing the runtime
> bloat, and the instruction count bloat here is only about a factor of 20.
It'll be a long time before disk i/o is fast enough to saturate the CPU
to the point of this issue being important. Even the fastest SAS
hardware that I've written a driver for struggles to get over 100,000
transactions/sec before saturating the attached disks, leaving plenty
of CPU remaining.
>
>> including <machine/bus_memio.h> made bus_space_read|write_4() compile
>> into a direct memory access on machines that supported it. The dubious
>> removal of bus_memio.h and bus_pio.h took away that benefit, and I'm
>> afraid that it's only getting worse now. Bus writes to card memory are
>> still very important to high-performance devices and shouldn't be
>> pessimized in the name of simpler-looking C code.
>
> I hate the bloat too, but it rarely matters (see above). I mainly noticed
> it using ddb. Where nice drivers are optimized to use a single instruction
> per i/o (or perhaps 2 with minimal bus space bloat), the ones that use
> rman_get*() on every i/o take 20+, and with a primitive debugger like ddb
> it is painful to even skip over all the function calls.
It's really the opportunity for gratuitous D-cache line misses that
worries me, and to a lesser extend the I-cache bloat, though that's
probably minimal. In the past I've been able to demonstrate
considerably better micro-performance, and modestly better
macro-performance, in the bge and em drivers from optimizing cache
efficiency in busdma.
>
> Which devices have fast enough i/o for the extra indirection to matter?
> bge tries hard to avoid all PCI access in time-critical code. I think
> we reduced the PCI accesses to 1 PCI write per interrupt. Most device
> accesses for bge use host memory which is DMAed to/from by the hardware,
> so bus space doesn't apply. This type of access seems to be the best
> or only way for a PCI device to go fast enough (though it is still too
> slow on at least i386 since the DMA always causes cache misses).
>
> Bruce
I'm thinking of 10Gb ethernet drivers, which have already shown to fully
saturate the host CPU long before wire bandwidth is filled. I'm only
familiar with cxgb and mxge; cxgb does a single bus write per
transaction, while mxge cheats and aliases the card memory registers via
rman_get_virtual(), making the single-instruction-access automatic =-)
Scott
More information about the svn-src-head
mailing list