When `drill` works but `nc` doesn't

Niklaas Baudet von Gersdorff stdin at niklaas.eu
Sat Sep 17 13:42:03 UTC 2016


I encountered a strange behaviour when trying to establish
a connection to my jails. When DNS is involved, I get a 2 seconds
delay. At the same time, there is no DNS resolving issue when
testing with `drill`.

As it turn out the problem is related to a specific IP address.
However, I don't know why exactly and I don't know to further
debug this. Maybe someone has an idea. In the following you can
read what I have done.

In one of my jails I have unbound running:

  $ sudo jexec unbound1 sockstat -46l | grep ':53'

  unbound  unbound    2474  3  udp4   *:53                  *:*
  unbound  unbound    2474  4  tcp4   *:53                  *:*
  unbound  unbound    2474  5  udp6   *:53                  *:*
  unbound  unbound    2474  6  tcp6   *:53                  *:*

This jail has some IP addresses assigned:

  $ sudo jexec unbound1 sockstat -46l | grep ':53'

  lo1: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
    options=600003<RXCSUM,TXCSUM,RXCSUM_IPV6,TXCSUM_IPV6>
    inet 10.3.3.1 netmask 0xffff0000 
    inet 10.77.3.1 netmask 0xffff0000 
    inet6 fd16:dcc0:f4cc:3::3:1 prefixlen 64 
    inet6 fd16:dcc0:f4cc:77::3:1 prefixlen 64 
    nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>

On my jails' host and in the other jails I have the following
/etc/resolv.conf:

  options timeout:1 attempts:1
  nameserver 10.77.3.1
  nameserver fd16:dcc0:f4cc:77::3:1

I realised that connecting from one jail to another one always
has a 2 seconds delay when DNS is involved

  $ sudo jexec www1 time nc -z mysql2.box-hlm-03.klaas 3306

  Connection to mysql2.box-hlm-03.klaas 3306 port [tcp/mysql] succeeded!
          2.02 real         0.00 user         0.00 sys

while the same attempt without DNS is instant:

  $ sudo jexec www1 time nc -z 10.3.5.3 3306

  Connection to 10.3.5.3 3306 port [tcp/mysql] succeeded!
          0.00 real         0.00 user         0.00 sys

So, I thought that this must be an DNS issue. And I checked
whether my unbound jail works fine:

  $ sudo jexec www1 drill mysql2.box-hlm-03.klaas @10.77.3.1

  ;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 21945
  ;; flags: qr rd ra ; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 4 
  ;; QUESTION SECTION:
  ;; mysql2.box-hlm-03.klaas.	IN	A

  ;; ANSWER SECTION:
  mysql2.box-hlm-03.klaas.	3339	IN	A	10.3.5.3

  ;; AUTHORITY SECTION:
  klaas.	2583	IN	NS	nsd1.box-hlm-01.klaas.
  klaas.	2583	IN	NS	nsd1.box-hlm-03.klaas.

  ;; ADDITIONAL SECTION:
  nsd1.box-hlm-01.klaas.	2583	IN	A	10.1.3.2
  nsd1.box-hlm-03.klaas.	2583	IN	A	10.3.3.2
  nsd1.box-hlm-01.klaas.	2583	IN	AAAA	fd16:dcc0:f4cc:1::3:2
  nsd1.box-hlm-03.klaas.	2583	IN	AAAA	fd16:dcc0:f4cc:3::3:2

  ;; Query time: 0 msec
  ;; SERVER: 10.77.3.1
  ;; WHEN: Sat Sep 17 15:09:33 2016
  ;; MSG SIZE  rcvd: 194

As you can see, there is no issue with unbound and I get an
answer using the same IP (10.77.3.1) instantly. When I kept on
wondering, I thought about investigating this a bit more. I ran
the following two commands

	$ sudo jexec www1 truss -D -o /tmp/truss-hostname nc -z mysql2.box-hlm-03.klaas 3306

	$ sudo jexec www1 truss -D -o /tmp/truss-IP nc -z 10.3.5.3 3306

and compared their output. The following extract is from
/tmp/truss-hostname. This part is not present in /tmp/truss-IP
and discloses the issue:

  0.000045537 open("/etc/hosts",O_CLOEXEC,0666)	 = 3 (0x3)
  0.000031010 fstat(3,{ mode=-rw-r--r-- ,inode=3455134,size=1090,blksize=32768 }) = 0 (0x0)
  0.000020953 read(3,"# $FreeBSD: releng/10.2/etc/host"...,32768) = 1090 (0x442)
  0.000013688 read(3,0x801433000,32768)		 = 0 (0x0)
  0.000015085 close(3)				 = 0 (0x0)
  0.000013409 madvise(0x801433000,0x8000,0x5)	 = 0 (0x0)
  0.000020952 gettimeofday({ 1474106763.341818 },0x0) = 0 (0x0)
  0.000012851 getpid()				 = 33788 (0x83fc)
  0.000016483 gettimeofday({ 1474106763.341916 },0x0) = 0 (0x0)
  0.000012292 issetugid()				 = 0 (0x0)
  0.000026540 open("/etc/resolv.conf",O_CLOEXEC,0666) = 3 (0x3)
  0.000013689 fstat(3,{ mode=-rw-r--r-- ,inode=3455168,size=84,blksize=32768 }) = 0 (0x0)
  0.000014806 clock_gettime(12,{ 1443270.700167332 }) = 0 (0x0)
  0.000018438 fstat(3,{ mode=-rw-r--r-- ,inode=3455168,size=84,blksize=32768 }) = 0 (0x0)
  0.000018717 read(3,"options timeout:1 attempts:1\nna"...,32768) = 84 (0x54)
  0.000018718 read(3,0x801433000,32768)		 = 0 (0x0)
  0.000015644 close(3)				 = 0 (0x0)
  0.000014248 __sysctl(0x7fffffff9578,0x2,0x7fffffff9840,0x7fffffff9580,0x0,0x0) = 0 (0x0)
  0.000058387 issetugid()				 = 0 (0x0)
  0.000030731 gettimeofday({ 1474106763.342896 },0x0) = 0 (0x0)
  0.000015645 kqueue()				 = 3 (0x3)
  0.000016762 socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 4 (0x4)
  0.000023187 connect(4,{ AF_INET 10.77.3.1:53 },16) = 0 (0x0)
  0.000032965 sendto(4,"g\M^F\^A\0\0\^A\0\0\0\0\0\0\^Dww"...,39,0x0,NULL,0x0) = 39 (0x27)
  0.000028775 clock_gettime(0,{ 1474106763.343553951 }) = 0 (0x0)
X 1.003734264 kevent(3,{ 4,EVFILT_READ,EV_ADD|EV_ONESHOT,0x0,0x0,0x0 },1,{ },1,{ 1.000000000 }) = 0 (0x0)
  0.000032685 socket(PF_INET6,SOCK_DGRAM|SOCK_CLOEXEC,0) = 5 (0x5)
  0.000059505 connect(5,{ AF_INET6 [fd16:dcc0:f4cc:77::3:1]:53 },28) = 0 (0x0)
  0.000097778 sendto(5,"g\M^F\^A\0\0\^A\0\0\0\0\0\0\^Dww"...,39,0x0,NULL,0x0) = 39 (0x27)
  0.000025143 clock_gettime(0,{ 1474106764.348156482 }) = 0 (0x0)
  0.000191924 kevent(3,{ 5,EVFILT_READ,EV_ADD|EV_ONESHOT,0x0,0x0,0x0 },1,{ 5,EVFILT_READ,EV_ONESHOT,0x0,0xc0,0x0 },1,{ 1.000000000 }) = 1 (0x1)
  0.000029333 recvfrom(5,"g\M^F\M^A\M^@\0\^A\0\^A\0\^B\0"...,65536,0x0,{ AF_INET6 [fd16:dcc0:f4cc:77::3:1]:53 },0x7fffffff90bc) = 192 (0xc0)
  0.000033245 close(4)				 = 0 (0x0)
  0.000028495 close(5)				 = 0 (0x0)
  0.000026819 close(3)				 = 0 (0x0)
  0.000031010 gettimeofday({ 1474106764.349346 },0x0) = 0 (0x0)
  0.000023746 kqueue()				 = 3 (0x3)
  0.000027098 socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 4 (0x4)
  0.000039949 connect(4,{ AF_INET 10.77.3.1:53 },16) = 0 (0x0)
  0.000042463 sendto(4,"\M^G\M^?\^A\0\0\^A\0\0\0\0\0\0"...,39,0x0,NULL,0x0) = 39 (0x27)
  0.000026540 clock_gettime(0,{ 1474106764.350025156 }) = 0 (0x0)
X 1.000184371 kevent(3,{ 4,EVFILT_READ,EV_ADD|EV_ONESHOT,0x0,0x0,0x0 },1,{ },1,{ 1.000000000 }) = 0 (0x0)
  0.000051683 socket(PF_INET6,SOCK_DGRAM|SOCK_CLOEXEC,0) = 5 (0x5)
  0.000061460 connect(5,{ AF_INET6 [fd16:dcc0:f4cc:77::3:1]:53 },28) = 0 (0x0)
  0.000091353 sendto(5,"\M^G\M^?\^A\0\0\^A\0\0\0\0\0\0"...,39,0x0,NULL,0x0) = 39 (0x27)
  0.000084089 clock_gettime(0,{ 1474106765.351039520 }) = 0 (0x0)
  0.000051962 kevent(3,{ 5,EVFILT_READ,EV_ADD|EV_ONESHOT,0x0,0x0,0x0 },1,{ 5,EVFILT_READ,EV_ONESHOT,0x0,0xcc,0x0 },1,{ 1.000000000 }) = 1 (0x1)
  0.000051962 recvfrom(5,"\M^G\M^?\M^A\M^@\0\^A\0\^A\0\^B"...,65536,0x0,{ AF_INET6 [fd16:dcc0:f4cc:77::3:1]:53 },0x7fffffff90bc) = 204 (0xcc)
  0.000053358 close(4)				 = 0 (0x0)
  0.000046933 close(5)				 = 0 (0x0)
  0.000047213 close(3)				 = 0 (0x0)
  0.000048330 madvise(0x801442000,0x1000,0x5)	 = 0 (0x0)
  0.000067047 madvise(0x80147c000,0x10000,0x5)	 = 0 (0x0)
  0.000044977 madvise(0x801433000,0x8000,0x5)	 = 0 (0x0)
  0.000048330 madvise(0x80144c000,0x10000,0x5)	 = 0 (0x0)
  0.000039391 madvise(0x80145c000,0x10000,0x5)	 = 0 (0x0)
  0.000039391 __sysctl(0x7fffffffa6d0,0x4,0x0,0x7fffffffa750,0x0,0x0) = 0 (0x0)
  0.000040508 __sysctl(0x7fffffffa6d0,0x4,0x80144c300,0x7fffffffa750,0x0,0x0) = 0 (0x0)
  0.000046934 socket(PF_INET6,SOCK_DGRAM|SOCK_CLOEXEC,17) = 3 (0x3)
  0.000055035 connect(3,{ AF_INET6 [fd16:dcc0:f4cc:1::4:1]:1 },28) = 0 (0x0)
  0.000044139 getsockname(3,{ AF_INET6 [fd16:dcc0:f4cc:3::2:1]:50722 },0x7fffffffa6cc) = 0 (0x0)
  0.000056432 ioctl(3,0xc1206949 { IORW 0x69('i'), 73, 288 },0xffffa5a8) ERR#6 'Device not configured'
  0.000044419 close(3)				 = 0 (0x0)
  0.000043302 socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,17) = 3 (0x3)
  0.000046934 connect(3,{ AF_INET 10.1.4.1:1 },16) = 0 (0x0)
  0.000041346 getsockname(3,{ AF_INET 10.3.2.1:33788 },0x7fffffffa6cc) = 0 (0x0)
  0.000042185 close(3)				 = 0 (0x0)
  0.000055874 socket(PF_INET,SOCK_STREAM,6)	 = 3 (0x3)
  0.000591137 connect(3,{ AF_INET 10.1.4.1:88 },16) = 0 (0x0)

I marked the most important two lines with "X". As you can see,
there are two delays with 1 second each, which explains the
2 seconds delay of the command above. (The rest of the output
I mostly don't understand.)

From that output I guessed that the delays are related to
10.77.3.1. As you can see, on the contrary to the attempt to
connect to the corresponding IPv6 at the end, connection to
the IPv4 seem to fail. I commented out the corresponding line
in /etc/resolv.conf of my jail(s):

  options timeout:1 attempts:1
  #nameserver 10.77.3.1
  nameserver fd16:dcc0:f4cc:77::3:1

Since then, I get an instant connection:

  $ sudo jexec www1 time nc -z mysql2.box-hlm-03.klaas 3306

  Connection to mysql2.box-hlm-03.klaas 3306 port [tcp/mysql] succeeded!
          0.00 real         0.00 user         0.00 sys

So, on the one hand it seems that there is an issue with
connecting to 10.77.3.1 and resolving the domain name in
question. On the other hand, as shown above, there is no issue
when `drill`ing the unbound jail directly. Why is that? This is
the first thing I don't understand.

Funny thing: As you can see at the beginning of my mail, the
unbound jail also has another IP: 10.3.3.1. If I use this IP
(instead of 10.77.3.1) in /etc/resolv.conf everything works as
expected. Why is that? This is the second thing that I don't
understand.

Luckily, the connection via IPv6 works. That's why a connection
could be established after 2 seconds eventually. Do you have any
idea why IPv6 works but IPv4 doesn't?

I would very much appreciate ideas for further debugging this. To
me it doesn't make sense that `drill` works but `nc` doesn't.

    Niklaas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-questions/attachments/20160917/e43baef9/attachment.sig>


More information about the freebsd-questions mailing list