[FIX] dummynet breaks IP reassembly

Andre Oppermann andre at freebsd.org
Wed Feb 15 12:27:36 PST 2006


Ruslan Ermilov wrote:
> 
> Hi Andre,
> 
> If I'm not mistaken, *you* converted ipfw to use pfil(9).
> During the conversion, the following bug was introduced.
> 
> When forwarding fragmented packets through a dummynet pipe
> (ip_input -> ip_forward -> ip_output -> pipe -> ip_output)
> the last ip_output() in the chain that does the actual IP
> delivery sets ip_id of all fragments to different values,
> making it impossible to reassemble the packet at receive
> side.
> 
> Example of forwarding a fragmented IP datagram from em0
> to xl0:
> 
> # tcpdump -nvi em0 net 192.168.2
> tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:04:35.170994 IP (tos 0x0, ttl  64, id 2186, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 2819, seq 0, length 1480
> 00:04:35.171242 IP (tos 0x0, ttl  64, id 2186, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp
> 
> # tcpdump -nvi xl0 net 192.168.2
> tcpdump: listening on xl0, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:04:35.171028 IP (tos 0x0, ttl  63, id 10560, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 2819, seq 0, length 1480
> 00:04:35.171266 IP (tos 0x0, ttl  63, id 10561, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp
> 
> Note that IDs were rewritten from the original 2186 and
> worse, they are different.
> 
> In 4,x, the "flags" field (see the patch below) was used
> to keep the IP_FORWARDING bit passed to ip_output() by
> ip_forward().  This bit was kept in the dummynet packet
> tag (in the "flags" field) and later passed to the second
> ip_output() call, causing the latter to NOT touch the IP
> header again.  This feature was lost, resulting in a bug.
> 
> Since dummynet never calls ip_output() with unfilled
> header, it's safe to always call ip_output() from dummynet
> with the IP_FORWARDING bit set, to indicate it's forwarded
> from another ip_output() and so it shouldn't attempt to
> modify the header.

That is correct.

> Surprisingly, it "seemed" to work for packets exceeding
> MTU and originating from the dummynetting host, mainly
> because the packet wasn't fragmented while in dummynet.
> Yet, the ip_id field was always incremented in my tests
> (pipe with no bandwidth limitation), comparing it before
> and after the dummynet processing.  Now, the ip_id is
> always preserved.

Packets that originated on the dummynetting host don't get
fragmented until they actually hit the if_output() in ip_output().
The entire over-sized packet stays intact while in dummynet
and only get fragmented later.

> # tcpdump -nvi em0 net 192.168.2
> tcpdump: listening on em0, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:12:04.654669 IP (tos 0x0, ttl  64, id 2192, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 7939, seq 0, length 1480
> 00:12:04.654917 IP (tos 0x0, ttl  64, id 2192, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp
> 
> # tcpdump -nvi xl0 net 192.168.2
> tcpdump: listening on xl0, link-type EN10MB (Ethernet), capture size 96 bytes
> 00:12:04.654703 IP (tos 0x0, ttl  63, id 2192, offset 0, flags [+], proto: ICMP (1), length: 1500) 192.168.1.3 > 192.168.2.1: ICMP echo request, id 7939, seq 0, length 1480
> 00:12:04.654939 IP (tos 0x0, ttl  63, id 2192, offset 1480, flags [none], proto: ICMP (1), length: 548) 192.168.1.3 > 192.168.2.1: icmp
> 
> (Note the ip_id is preserved when forwarding.)
> 
> I'm not sure about the IPv6 portion but it's consistent
> with the normal forwarding path so I believe it's correct.
> Comments?

Looks correct.  Dunno about the effect on IPv6.  IIRC IPv6 doesn't
support fragmenting packets and always has to do PMTU discovery.
Thanks for tracking down and fixing this bug.

-- 
Andre


> %%%
> Index: ip_dummynet.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.c,v
> retrieving revision 1.98
> diff -u -p -r1.98 ip_dummynet.c
> --- ip_dummynet.c       3 Feb 2006 11:38:19 -0000       1.98
> +++ ip_dummynet.c       10 Feb 2006 21:20:59 -0000
> @@ -769,7 +769,7 @@ dummynet_send(struct mbuf *m)
>                 pkt = dn_tag_get(m);
>                 switch (pkt->dn_dir) {
>                 case DN_TO_IP_OUT:
> -                       ip_output(m, NULL, NULL, pkt->flags, NULL, NULL);
> +                       ip_output(m, NULL, NULL, IP_FORWARDING, NULL, NULL);
>                         break ;
>                   case DN_TO_IP_IN :
>                         ip = mtod(m, struct ip *);
> @@ -783,7 +783,7 @@ dummynet_send(struct mbuf *m)
>                         break;
> 
>                 case DN_TO_IP6_OUT:
> -                       ip6_output(m, NULL, NULL, pkt->flags, NULL, NULL, NULL);
> +                       ip6_output(m, NULL, NULL, IPV6_FORWARDING, NULL, NULL, NULL);
>                         break;
>  #endif
>                 case DN_TO_IFB_FWD:
> @@ -1129,7 +1129,6 @@ locate_pipe(int pipe_nr)
>   * ifp         the 'ifp' parameter from the caller.
>   *             NULL in ip_input, destination interface in ip_output,
>   * rule                matching rule, in case of multiple passes
> - * flags       flags from the caller, only used in ip_output
>   *
>   */
>  static int
> @@ -1213,8 +1212,6 @@ dummynet_io(struct mbuf *m, int dir, str
>      pkt->dn_dir = dir ;
> 
>      pkt->ifp = fwa->oif;
> -    if (dir == DN_TO_IP_OUT || dir == DN_TO_IP6_OUT)
> -       pkt->flags = fwa->flags;
> 
>      if (q->head == NULL)
>         q->head = m;
> Index: ip_dummynet.h
> ===================================================================
> RCS file: /home/ncvs/src/sys/netinet/ip_dummynet.h,v
> retrieving revision 1.38
> diff -u -p -r1.38 ip_dummynet.h
> --- ip_dummynet.h       29 Nov 2005 00:11:01 -0000      1.38
> +++ ip_dummynet.h       10 Feb 2006 21:13:45 -0000
> @@ -130,7 +130,6 @@ struct dn_pkt_tag {
> 
>      dn_key output_time;                /* when the pkt is due for delivery     */
>      struct ifnet *ifp;         /* interface, for ip_output             */
> -    int flags ;                        /* flags, for ip_output (IPv6 ?)        */
>      struct _ip6dn_args ip6opt; /* XXX ipv6 options                     */
>  };
>  #endif /* _KERNEL */
> Index: ip_fw.h
> ===================================================================
> RCS file: /home/ncvs/src/sys/netinet/ip_fw.h,v
> retrieving revision 1.103
> diff -u -p -r1.103 ip_fw.h
> --- ip_fw.h     13 Dec 2005 12:16:02 -0000      1.103
> +++ ip_fw.h     10 Feb 2006 21:15:41 -0000
> @@ -510,8 +510,6 @@ struct ip_fw_args {
>         struct ip_fw    *rule;          /* matching rule                */
>         struct ether_header *eh;        /* for bridged packets          */
> 
> -       int flags;                      /* for dummynet                 */
> -
>         struct ipfw_flow_id f_id;       /* grabbed from IP header       */
>         u_int32_t       cookie;         /* a cookie depending on rule action */
>         struct inpcb    *inp;
> %%%
> 
> Cheers,
> --
> Ruslan Ermilov
> ru at FreeBSD.org
> FreeBSD committer
> 
>   --------------------------------------------------------------------------------
>    Part 1.2Type: application/pgp-signature


More information about the freebsd-net mailing list