kern/144311: massive ICMP storm on lo0 occurs when using pf(4) 'reply-to'

Yoshiaki Kasahara kasahara at nc.kyushu-u.ac.jp
Fri Feb 26 06:00:12 UTC 2010


>Number:         144311
>Category:       kern
>Synopsis:       massive ICMP storm on lo0 occurs when using pf(4) 'reply-to'
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Fri Feb 26 06:00:09 UTC 2010
>Closed-Date:
>Last-Modified:
>Originator:     Yoshiaki Kasahara
>Release:        FreeBSD 8.0-STABLE amd64
>Organization:
Kyushu University
>Environment:
System: FreeBSD elvenbow.cc.kyushu-u.ac.jp 8.0-STABLE FreeBSD 8.0-STABLE #1: Fri Feb 19 16:44:40 JST 2010 root at elf2.nc.kyushu-u.ac.jp:/usr/obj/usr/src/sys/GENERIC amd64


	
>Description:

A massive amount of 'ICMP unreachable - fragmentation needed' observed
on lo0 when pf(4) 'reply-to' is used for policy routing, which
degrades the overall performance of the system severely.

I have a web server with two NIC connected to different outgoing
networks.  Each network has a spoof filter, so I need to reply back to
the I/F where the connection came from.

+-----+
|    em0(IP1.IP1.IP1.IP1) -- ISP1(GW1.GW1.GW1.GW1)
|     |
|    em1(IP2.IP2.IP2.IP2) -- ISP2(GW2.GW2.GW2.GW2)
+-----+

So I use pf(4)'s 'reply-to' rule and noticed the symptom.

The simplified pf.conf which show the symptom is as follows (IP
addresses are masked):

-------------
if_isp1="em0"
isp1_router="GW1.GW1.GW1.GW1"
if_isp2="em1"
isp2_router="GW2.GW2.GW2.GW2"

pass in all
pass in reply-to ( $if_isp1 $isp1_router ) from any to $if_isp1
pass in reply-to ( $if_isp2 $isp2_router ) from any to $if_isp2
pass out all
-------------

Then access the web server on IP1 from a client (SIP.SIP.SIP.SIP) and
retrieve a large file such as a picture. While doing so, tcpdump -n -i
lo0 shows a massive amount of ICMP packets flowing like this:

# tcpdump -n -i lo0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on lo0, link-type NULL (BSD loopback), capture size 96 bytes
12:53:59.784441 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
12:53:59.784772 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
12:53:59.785001 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
12:53:59.785288 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
12:53:59.785482 IP 127.0.0.1 > IP1.IP1.IP1.IP1: ICMP SIP.SIP.SIP.SIP unreachable - need to frag (mtu 1500), length 48
.....(omit)

The SIP host can retrieve the file, but the throughput is very
poor.

netstat(1) also shows an abnormal number of packet counts (irrelevant
lines removed).

% netstat -ni
Name    Mtu Network       Address              Ipkts Ierrs Idrop    Opkts Oerrs  Coll
em0    1500 <Link#1>      00:1c:c0:fa:c4:6a    79142     0     0    80093     0     0
em0    1500 IP1.IP1.IP1.9 IP1.IP1.IP1.IP1   2090652887     -     -      472     -     -
em1    1500 <Link#2>      00:1b:21:52:52:60   141017     0     0    59392     0     0
em1    1500 IP2.IP2.IP2.0 IP2.IP2.IP2.IP2      83355     -     -    58112     -     -
lo0   16384 <Link#6>                        2090617974     0     0 2090617950     0     0
lo0   16384 127.0.0.0/8   127.0.0.1            35119     -     - 2090610857     -     -

Some hardware combination didn't seem to exhibit the symptom.
Actually I recently replaced the server and suddenly the problem
started to occur.  I examined the old server and noticed that I could
also reproduce the symptom on the old server when I changed the
default route.  Old system runs FreeBSD 8.0R-p1 amd64.

FreeBSD elf2.nc.kyushu-u.ac.jp 8.0-RELEASE-p1 FreeBSD 8.0-RELEASE-p1 #4: Wed Dec 16 15:49:14 JST 2009     root at elvenbow.cc.kyushu-u.ac.jp:/usr/obj/usr/src/sys/GENERIC  amd64

On the old system, msk(4) and vge(4) are used for ISP connections.
Default route to msk(4) is okay, but change it toward vge(4) exhibits
the problem. Exchanging NIC for ISP1 and ISP2 doesn't matter, so it is
more related to hardware (driver?) than network configuration, I
guess.

>How-To-Repeat:

Explained in the Description section.

>Fix:

Unknown.  I don't understand what is the source of these ICMP packets
and why they are generated.
>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list