dummynet dropping too many packets

Sun Oct 18 06:30:41 UTC 2009

Peter Jeremy wrote:
> On 2009-Oct-04 18:47:23 +0500, rihad <rihad at mail.ru> wrote:
>> Hi, we have around 500-600 mbit/s traffic flowing through a 7.1R
>> Dell PowerEdge w/ 2 GigE bce cards. There are currently around 4
>> thousand ISP users online limited by dummynet pipes of various
>> speeds. According to netstat -s output around 500-1000 packets are
>> being dropped every second (this accounts for wasting around 7-12
>> mbit/s worth of traffic according to systat -ifstat):
> 
> This has been a most interesting thread.  A couple of comments:
> 
> Traffic shaping only works cleanly on TCP flows - UDP has no feedback
>  mechanism and so will not automatically throttle to fit into the 
> available bandwidth, potentially leading to high packet drops within 
> dummynet.  Is it possible that some of your customers are heavily 
> using UDP? Have you tried allowing just UDP traffic to bypass the
> pipes to see if this has any effect on drop rate?
We only process inbound traffic, and anyway this problem couldn't be
related because net.inet.ip.dummynet.io_pkt_drop normally doesn't
reflect netstat -s's "output packets dropped" pace (e.g. now the
former's only 1048, but the latter is as much as 1272587).

> The pipe lists you posted showed that virtually all the packet drops 
> are associated with one or two IP addresses.  If this is really true,
Not really. There were only a few hundred of the several thousand online 
users in the list. Besides those drops are within sane limits (as 
determined by io_pkt_drop sysctl), it's the netstat -s's output packet 
drops that matter.

> Also, if you monitor the pipe lists following a 
> cold start, do those addresses appear early and just not show any 
> packet loss until the total number of users builds up or do they not 
> appear until later and immediately show packet loss?
> 
io_pkt_drop may rise at certain well-defined periods, like when turning 
dummynet on (by deleting the "allow ip from any to any" line before the 
pipes), and it may rise for certain heavy downloaders, but the value is 
normally negligible.

> Looking at how 'output packets dropped due to no bufs, etc.' is 
> counted (ipstat.ips_odropped), if you run 'netstat -id', do you see a
>  large number of drops on bce1 (consistent with the "output packets 
> dropped" counts) or not?  This will help narrow down the codepath 
> being followed by dropped packets.
netstat -id:
Yup, it's comparable:
bce0   1500 <Link#1>      00:1d:09:2a:06:7f 5518562854     0 14327023 
   0     0    0
bce1   1500 <Link#2>      00:1d:09:xx:xx:xx   144918     0 5498628928 
   0     0 1135438
netstat -s:
         1272587 output packets dropped due to no bufs, etc.

> 
> Since the problem only appears to manifest when table(0) exceeds 2000
>  entries, have you considered splitting (at least temporarily) that 
> table (and possibly table(2)) into two (eg table(0) and table(4))? 
> This would help rule out an (unlikely) problem with table sizes. Doin
> so would require the application to split the users across both 
> tables (eg round-robin or based on one of the bits in the IP address)
>  and then duplicating the relevant ipfw rules - eg:
> 
> 01060 pipe tablearg ip from any to table(0) out recv bce0 xmit bce1 
> 01061 pipe tablearg ip from any to table(4) out recv bce0 xmit bce1 
> 01070 allow ip from any to table(0) out recv bce0 xmit bce1 01071
> allow ip from any to table(4) out recv bce0 xmit bce1
> 

Around 3000 now (and around 480-500 mbps) as I've set the queue length 
in bce to 1024 and rebuilt the kernel. I'm going to increase that a bit 
again. I really think it's the dummynet burstiness, not table size per 
se, that results in the drops, and the value of burstiness depends on 
the number of "online" users. A command as simple as "ipfw table 0 
flush" stops all drops instantly, but still allowing that traffic to 
pass through as is (thank God). It's quite easy for me to simulate the 
split in two by doing some shell scripting without touching any code, 
but I don't think it's the table sizes. I'll try that in case increasing 
the bce maxlen value won't help, though, so thank you.

> (And I agree that re-arranging rules to reduce the number of repeated
>  tests should improve ipfw efficiency).
> 
> The symptoms keep making me think "lock contention" - but I'm not
> sure how to measure that cheaply (AFAIK, LOCK_PROFILING is
> comparatively expensive).
> 
> Finally, are you running i386 or amd64?
>