routing bug?

Andre Oppermann andre at freebsd.org
Fri May 7 14:12:57 PDT 2004


Mike Tancsa wrote:
> A follow up to the post below on stable.  We tried the same program on 
> OpenBSD, and it does the same thing as RELENG_4.  However, HEAD as of 
> yesterday and 5.2.1 do not show the same behaviour as RELENG_4.  Does 
> this ring a bell with anyone ?  Ideally, we would like to see the same 
> behaviour on RELENG_4 as it causes major grief for us with raccoon when 
> dynamic interfaces come and go.

In -current protocol cloning is gone and pointers to an rtentry are no
longer stored in the inpcb.  This causes a route lookup to be done for
every packet that goes out.  In RELENG_4 the same happens only if you
use an unspecific UDP socket.  It will do a route lookup every time a
UDP packet is being sent to determine the source address and thus it
will pick up any changes that happen in the routing table.  However if
you bind the UDP socket to a specific address the route pointer to the
rtentry valid at the time of the bind is stored.  This is only being
revisited when the rtentry the pointer points to is deleted.  Then the
rtentry pointer is set to NULL and with the next packet sent it will be
looked up freshly and stored again.  The problem you have is that it
will never switch back from an existing and working route back to a
more specific one because it never tries to find one.  The only fix
with the code in RELENG_4 is to delete the default route, send a packet
and reinsert the default route.  Reexamining all inpcb rtentry pointers
when a new route is installed is a very expensive operation due to many
table walks through all inpcbs.  In -current this rtentry caching has
been removed due to the complexity it put into the network stack and
to the fine-grained locking for efficient SMP.  While it is certainly
a little expense to do a route lookup for every packet sent, this is
nothing extraordinary as it happens every time the machine forwards
a packet as a router and it is rather efficient.  This does not say
it is perfect but sucks a lot less than before.  Work is underway to
allow for much better optimized pluggable routing tables in the network
stack and especially for IPv4.

I hope this explains the effect you are seeing.  A backport or MFC of
the protocol cloning removal is not possible and is a rather extensive
change to the network stack which is outside of the RELENG_4 scope.
There is one easy hack you can do for UDP sockets.  That is you simply
RTFREE() the rtentry pointed to by the inpcb early on in udp_output()
just to let a new one be reaquired later on.  If it is NULL the code
will simply redo the routing table lookup and store a new pointer the
rtentry.  With this every time you send a UDP packet you'll get the
corret route.  A little bit more extensive hack would simply avoid
storing the rtentry pointer in the inpcb at all.  I once did this for
an RELENG_4 machine without any ill effects (if done correctly).  The
same applies to OpenBSD and NetBSD.

-- 
Andre


>         ---Mike
> 
> At 09:59 AM 07/05/2004, Gabor wrote:
> 
>> I am experiencing some weird routing phenomena.
>> When I open a UDP socket and send datagrams to an address(172.30.1.1)
>> and then remove that route(route delete 172.30.1.1) then my packets
>> switch from going out the route specific interface(rl0) to going out
>> the default interface(fxp0).  This is as expected.  Then I add back
>> the route (route add 172.30.1.1 10.0.2.2) and the packets swing back
>> to the route specific interface(rl0).  However, if I bind my socket to
>> a source address(172.16.24.33), when I remove the route and then add
>> it back, the packets continue to go out the default interface(fxp0)
>> instead of going out the route specific interface(rl0).
>>
>> This is on 4.9 STABLE.  I was unable to reproduce this on 5.2.1.  What
>> changes have there been that haven't been MFC'ed?
>>
>> =0= udp-test # netstat -nr
>> Routing tables
>>
>> Internet:
>> Destination        Gateway            Flags    Refs      Use  Netif 
>> Expire
>> default            192.168.43.1       UGSc        2     2440   fxp0
>> 10.0.2/24          link#1             UC          2        0    rl0
>> 10.0.2.1           00:50:fc:32:52:a7  UHLW        0        2    lo0
>> 10.0.2.2           00:0e:0c:05:09:19  UHLW        1        3    
>> rl0     69
>> 172.0.0.1          172.0.0.1          UH          0        0    lo0
>> 172.30.1.1         10.0.2.2           UGHS        0        0    rl0
>> 192.168.7          link#1             UC          0        0    rl0
>> 192.168.43         link#2             UC          4        0   fxp0
>> 192.168.43.1       00:50:bf:33:63:70  UHLW        4    56111   fxp0   
>> 1079
>> 192.168.43.31      link#2             UHLW        1      249   fxp0
>> 192.168.43.157     00:a0:c9:4b:a5:f4  UHLW        6  2168880   fxp0    
>> 771
>> 192.168.43.242     link#2             UHLW        1   718878   fxp0
>>
>> =0= udp-test # ifconfig
>> rl0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>>         inet 192.168.7.2 netmask 0xffffff00 broadcast 192.168.7.255
>>         inet 10.0.2.1 netmask 0xffffff00 broadcast 10.0.2.255
>>         ether 00:50:fc:32:52:a7
>>         media: Ethernet 10baseT/UTP
>>         status: active
>> fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
>>         inet 192.168.43.26 netmask 0xffffff00 broadcast 192.168.43.255
>>         ether 00:01:80:3d:b4:4f
>>         media: Ethernet autoselect (100baseTX <full-duplex>)
>>         status: active
>> ppp0: flags=8010<POINTOPOINT,MULTICAST> mtu 1500
>> faith0: flags=8002<BROADCAST,MULTICAST> mtu 1500
>> ds0: flags=8008<LOOPBACK,MULTICAST> mtu 65532
>> lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
>>         inet 172.16.24.33 netmask 0xffff0000
>>         inet 172.0.0.1 netmask 0xffff0000
>> tun0: flags=8010<POINTOPOINT,MULTICAST> mtu 1500
>>
>> =0= udp-test # cat udp-test.c
>> #include <sys/types.h>
>> #include <sys/time.h>
>> #include <sys/socket.h>
>> #include <netinet/in.h>
>> #include <netdb.h>
>> #include <unistd.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>> #include <ctype.h>
>>
>> int send_pkt(unsigned char *src, unsigned char *dest, unsigned short 
>> port);
>>
>> int
>> main(int argc, char **argv)
>> {
>>     unsigned a, b, c, d, a2, b2, c2, d2;
>>     unsigned port;
>>     unsigned char src[4], dest[4];
>>
>>
>>     if (argc != 3) {
>>         fprintf(stderr,
>>                 "Usage: %s <source> <dest>:<port>\n",
>>                 argv[0]);
>>         return 1;
>>     }
>>
>>     if (sscanf(argv[1], "%u.%u.%u.%u", &a, &b, &c, &d) == 4
>>             && a < 256 && b < 256 && c < 256 && d < 256
>>             && sscanf(argv[2], "%u.%u.%u.%u:%u", &a2, &b2, &c2, &d2, 
>> &port) == 5
>>             && a2 < 256 && b2 < 256 && c2 < 256 && d2 < 256 && port < 
>> 65536) {
>>         /* OK */
>>         src[0]  = a;
>>         src[1]  = b;
>>         src[2]  = c;
>>         src[3]  = d;
>>         dest[0] = a2;
>>         dest[1] = b2;
>>         dest[2] = c2;
>>         dest[3] = d2;
>>         send_pkt(src, dest, port);
>>     }
>>     else {
>>         fprintf(stderr,
>>                 "Usage: %s <source> <dest>:<port>\n",
>>                 argv[0]);
>>         return 1;
>>     }
>>
>>     return 0;
>> }
>>
>> int
>> send_pkt(unsigned char *src, unsigned char *dest, unsigned short port)
>> {
>>     int s, len, cnt, rc, on;
>>     struct protoent *proto;
>>     struct sockaddr_in to, from;
>>     char data[1024];
>>
>>     if (!(proto = getprotobyname("udp"))) {
>>         perror("getprotobyname");
>>         return -1;
>>     }
>>
>>     if ((s = socket(PF_INET, SOCK_DGRAM, proto->p_proto)) < 0) {
>>         perror("socket");
>>         return -1;
>>     }
>>     on = 1;
>>
>>     memset(&from, 0, sizeof from);
>>     from.sin_family = AF_INET;
>>     from.sin_port = htons(0);
>>     memcpy(&from.sin_addr.s_addr, src, 4);
>>     fprintf(stderr,
>>             "bind:%d\n",
>>             bind(s, (struct sockaddr*)&from, sizeof from));
>>
>>     memset(&to, 0, sizeof to);
>>     to.sin_family = AF_INET;
>>     to.sin_port = htons(port);
>>     memcpy(&to.sin_addr.s_addr, dest, 4);
>>     len = 58;
>>     cnt = 0;
>>     while (1) {
>>         memset(data, cnt, len);
>>         rc = sendto(s, data, len, 0, (struct sockaddr*)&to, sizeof to);
>>         if (rc < 0)
>>             perror("");
>>         fprintf(stderr, "%d %d\n", rc, cnt);
>>         sleep(5);
>>         ++cnt;
>>     }
>>     close(s);
>>
>>     return 0;
>> }
>>
>> _______________________________________________
>> freebsd-stable at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
> 
> 
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
> 
> 



More information about the freebsd-current mailing list