[Bug 214643] net-mgmt/telegraf: service (re)start hangs when scripted: missing option -f with daemon(8)

Fri Nov 18 23:27:14 UTC 2016

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214643

            Bug ID: 214643
           Summary: net-mgmt/telegraf: service (re)start hangs when
                    scripted: missing option -f with daemon(8)
           Product: Ports & Packages
           Version: Latest
          Hardware: Any
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: Individual Port(s)
          Assignee: girgen at FreeBSD.org
          Reporter: Mark.Martinec at ijs.si
             Flags: maintainer-feedback?(girgen at FreeBSD.org)
          Assignee: girgen at FreeBSD.org

Created attachment 177161
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=177161&action=edit
A patch to add the -f daemon(8) flag to net-mgmt/telegraf/files/telegraf.in

I came across this issue when trying to manage a service 'telegraf'
with SaltStack (i.e. sysutils/py-salt). I would expect the same
problem can occur under other management tools like Puppet/Chef/Ansible.

What happens is that when a salt-minion tries to start or restart
a telegraf process by spawning a command: 'service telegraf restart',
the process just hangs, waiting for a restart to happen, even though
the telegraf process does restart successfully and is again running
underneath.

Investigating the issue boils down to this simple and repeatable
test case, which mimics the environment under salt-minion (a Python
process), feeding a stdout to a pipe:

  # service telegraf restart | cat

  Stopping telegraf.
  Waiting for PIDS: 60067.
  Starting telegraf.
    (and hangs, even though the telegraf daemon has indeed
     successfully restarted meanwhile)

With rc_debug="YES" the same thing looks like:  (wrapped for clarity)

  # service telegraf restart | cat

  /usr/local/etc/rc.d/telegraf: DEBUG: checkyesno: telegraf_enable is
    set to YES.
  Stopping telegraf.
  /usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: doit:
    kill -TERM 919 Waiting for PIDS: 919.
  /usr/local/etc/rc.d/telegraf: DEBUG: pid file (/var/run/telegraf.pid):
    not readable.
  /usr/local/etc/rc.d/telegraf: DEBUG: checkyesno: telegraf_enable is
    set to YES.
  /usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: start_precmd:
   telegraf_prestart
  Starting telegraf.
  /usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: doit:
    /usr/sbin/daemon  -crP /var/run/telegraf.pid /usr/local/bin/telegraf  -
      config=/usr/local/etc/telegraf.conf 2>> /var/log/telegraf.log
(and hangs)

The problem is that the rc.d script for a telegraf package uses
the daemon(8) utility to run the telegraf process, but forgets
to specify the '-f' option to /usr/sbin/daemon.

$ man daemon
   daemon — run detached from the controlling terminal
   -f      Redirect standard input, standard output and
           standard error to /dev/null.
   [...] If the -p, -P or -r option is specified the program
   is executed in a spawned child process.

So what happens is:
- The process executing the daemon(8) program (as started by
  the 'service' command) has its stdout directed to a pipe;
- The daemon(8) program disassociates from a controlling terminal
  but DOES NOT close its stdout, and then forks;
- The forked daemon(8) subprocess inherits the open fd to a pipe;
- The parent daemon(8) then exits, but the child daemon(8) still
  has a pipe open, connected to cat(1) (or to salt-minion);
- the child daemon(8) process then exec's the telegraf program,
  which again inherits the pipe as its stdout.

So even though the parent daemon(8) process no longer exists and
the 'service' command could now be expected to finish, the sending
side of the pipe is still connected to a (re)started telegraf process,
and the 'cat' keeps hanging indefinitely, waiting for the pipe on
its stdin to close, which never happens.

The fix is trivial, just add the option -f to the daemon(8)
command on the rc.d script, forcing it to close stdout and stderr
before forking:

 --- /usr/local/etc/rc.d/telegraf~       2016-11-18 23:53:57.046298000 +0100
+++ /usr/local/etc/rc.d/telegraf        2016-11-18 23:54:17.675132000 +0100
@@ -31,3 +31,3 @@
 command=/usr/sbin/daemon
-command_args="-crP ${pidfile} /usr/local/bin/${name} ${telegraf_flags}
-config=${telegraf_conf} 2>> /var/log/telegraf.log"
+command_args="-f -crP ${pidfile} /usr/local/bin/${name} ${telegraf_flags}
-config=${telegraf_conf} 2>> /var/log/telegraf.log"

(the patch is attached)


Considering that there are at least a dozen of ports using a daemon(8)
utility but forgetting to specify the -f option, I wonder if it would
not be wiser to change the daemon(8) utility to close by default the
stdin/stdout/stderr after disassociating from a controlling terminal
but before forking, as this is something that one expects as an
essential step in daemonizing a process.  IMO this may be preferred
to filing a dozen of bug reports similar to this one for various other
ports, even at the expense of incompatibility.

-- 
You are receiving this mail because:
You are the assignee for the bug.