bge dropping packets issue

Alexander Sack pisymbol at gmail.com
Wed Apr 16 16:58:20 UTC 2008


Hello:

Sorry for cross posting but this seems to be both a driver and
network/kernel issue so I figure I actually thought all lists seemed
appropriate.

I'm investigating an issue we are seeing with 6.1-RELEASE and the bge
driver dropping packets sporadically at 100MBps speed.  The machine is
a 2-way Intel dual-core running 64-bit FreeBSD-6.1 Release with
SMP/8GB RAM.  I would post dmesg but currently I'm running a test and
has a lot of instrumentation in it.  Anyway, what I'm seeing with a
SmartBit traffic generator connected to 4 bge cards (all
BCOM_DEVICEID_BCM5704C) is sporadic packet drops as recorded by the
firmware in its statistics structure (as pulled out by bge_tick()),
i.e. this isn't malloc starvation of allocating mbuf clusters, etc.
The firmware seems to just drop packets occasionally (depending on
workload).  Its get mainly aggravated when heavy disk I/O occurs from
generating a network report which entails gzip'ing a very large
dumpfile in /tmp and then anonymously ftping it via another interface
(em).

DEVICE_POLLING is being used:

# sysctl -a | grep kern.polling
kern.polling.idlepoll_sleeping: 1
kern.polling.stalled: 3
kern.polling.suspect: 1023
kern.polling.phase: 0
kern.polling.enable: 1
kern.polling.handlers: 6
kern.polling.residual_burst: 0
kern.polling.pending_polls: 0
kern.polling.lost_polls: 24436
kern.polling.short_ticks: 592
kern.polling.reg_frac: 20
kern.polling.user_frac: 50
kern.polling.idle_poll: 0
kern.polling.each_burst: 32
kern.polling.burst_max: 1000
kern.polling.burst: 1000

After looking at the driver for a bit, I believe the issue maybe from
RX chain starvation which causes the firmware to drop packets when
bge_rxeof() can not keep up.  The driver uses a global locking scheme
which may contribute to some of these robustness issues (this is a
generalization on my part without hard facts so take it with a grain
of salt, I just notice things like bge_tick() being called every cycle
and competing with the ISR when it may not have too or may not have
too for its entire duration, updating stats for example).

My main question is currently the RX chain slots are set to a global
define BGE_SSLOTS (if_bgedevreg.h) which is 256.  Technically this
card I believe can do up to 512 slots and the comment above said these
are tunable yet not exposed via SYSCTL.

Does anyone know why its not 512 by default?  Is there any harm in
setting it to 512 instead of 256?  Why not make it tunable (512 as
max)?

I've increased the SSLOTS to 512 so there are more RX chain slots
available (as I currently understand it, I don't have specs) and the
kern.polling.each_burst to 150 (max) in an effort to try to keep the
BGE driver in bge_rxeof() and to experiment a bit!  This is the first
exposure to this code so be gentle! :D!

Has anyone seen this problem before with bge?  Am I barking up the
wrong tree with my initial investigation?  Does anyone know if its
even possible to achieve 100% packet capture with bge at its supported
speeds (10/100/1000)?

Thanks!

-aps


More information about the freebsd-net mailing list