intermittent network failures with drill and icinga2

Sat Aug 17 04:06:14 UTC 2019

12.0-RELEASE-p9, icinga2 2.10.5_1, drill 1.7.0

Do drill and ping use different system calls to resolve hostnames to IP 
addresses?

Asking because around 5x-10x per day, icinga2 returns an error because 
this system can't resolve a hostname to an IP address.

However, the system is reachable by ssh during these error periods, and 
it _can_ resolve hostnames when using ping.

Here's an example where drill doesn't work and ping does:

[dnewman at hood ~]$ drill mail.networktest.com @puck.nether.net
Error: error sending query: Could not send or receive, because of 
network error

[dnewman at hood ~]$ ping puck.nether.net
PING puck.nether.net (204.42.254.5): 56 data bytes
64 bytes from 204.42.254.5: icmp_seq=0 ttl=51 time=76.332 ms

[dnewman at hood ~]$ drill mail.networktest.com @puck.nether.net
Error: error sending query: Could not send or receive, because of 
network error

The /etc/resolv.conf file points to two internal nameservers, both 
reachable:

[dnewman at hood ~]$ cat /etc/resolv.conf
search inf.networktest.com networktest.com
nameserver 172.31.53.12
nameserver 172.31.53.13

Also, icinga2 resolves hundreds of hostnames but almost exclusively this 
problem occurs when doing a check on puck.nether.net. I don't think 
there's anything wrong with puck.nether.net DNS or reachability; even 
this system can ping it, and I can resolve it from any other host.

Other host checks and networking on this system otherwise work fine.

Thanks in advance for clues on what might cause these intermittent 
failures in drill and icinga2, and what to do to fix them.

dn

ps. This system is a VMware VM. I don't believe it's a VMware issue, 
however; aside from periodic inability to reach one host its networking 
works OK, and all other server VMs on the same VMware host with similar 
network configurations don't have this issue.