[Bug 214643] net-mgmt/telegraf: service (re)start hangs when scripted: missing option -f with daemon(8)
bugzilla-noreply at freebsd.org
bugzilla-noreply at freebsd.org
Fri Nov 18 23:27:14 UTC 2016
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214643
Bug ID: 214643
Summary: net-mgmt/telegraf: service (re)start hangs when
scripted: missing option -f with daemon(8)
Product: Ports & Packages
Version: Latest
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: Individual Port(s)
Assignee: girgen at FreeBSD.org
Reporter: Mark.Martinec at ijs.si
Flags: maintainer-feedback?(girgen at FreeBSD.org)
Assignee: girgen at FreeBSD.org
Created attachment 177161
--> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=177161&action=edit
A patch to add the -f daemon(8) flag to net-mgmt/telegraf/files/telegraf.in
I came across this issue when trying to manage a service 'telegraf'
with SaltStack (i.e. sysutils/py-salt). I would expect the same
problem can occur under other management tools like Puppet/Chef/Ansible.
What happens is that when a salt-minion tries to start or restart
a telegraf process by spawning a command: 'service telegraf restart',
the process just hangs, waiting for a restart to happen, even though
the telegraf process does restart successfully and is again running
underneath.
Investigating the issue boils down to this simple and repeatable
test case, which mimics the environment under salt-minion (a Python
process), feeding a stdout to a pipe:
# service telegraf restart | cat
Stopping telegraf.
Waiting for PIDS: 60067.
Starting telegraf.
(and hangs, even though the telegraf daemon has indeed
successfully restarted meanwhile)
With rc_debug="YES" the same thing looks like: (wrapped for clarity)
# service telegraf restart | cat
/usr/local/etc/rc.d/telegraf: DEBUG: checkyesno: telegraf_enable is
set to YES.
Stopping telegraf.
/usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: doit:
kill -TERM 919 Waiting for PIDS: 919.
/usr/local/etc/rc.d/telegraf: DEBUG: pid file (/var/run/telegraf.pid):
not readable.
/usr/local/etc/rc.d/telegraf: DEBUG: checkyesno: telegraf_enable is
set to YES.
/usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: start_precmd:
telegraf_prestart
Starting telegraf.
/usr/local/etc/rc.d/telegraf: DEBUG: run_rc_command: doit:
/usr/sbin/daemon -crP /var/run/telegraf.pid /usr/local/bin/telegraf -
config=/usr/local/etc/telegraf.conf 2>> /var/log/telegraf.log
(and hangs)
The problem is that the rc.d script for a telegraf package uses
the daemon(8) utility to run the telegraf process, but forgets
to specify the '-f' option to /usr/sbin/daemon.
$ man daemon
daemon — run detached from the controlling terminal
-f Redirect standard input, standard output and
standard error to /dev/null.
[...] If the -p, -P or -r option is specified the program
is executed in a spawned child process.
So what happens is:
- The process executing the daemon(8) program (as started by
the 'service' command) has its stdout directed to a pipe;
- The daemon(8) program disassociates from a controlling terminal
but DOES NOT close its stdout, and then forks;
- The forked daemon(8) subprocess inherits the open fd to a pipe;
- The parent daemon(8) then exits, but the child daemon(8) still
has a pipe open, connected to cat(1) (or to salt-minion);
- the child daemon(8) process then exec's the telegraf program,
which again inherits the pipe as its stdout.
So even though the parent daemon(8) process no longer exists and
the 'service' command could now be expected to finish, the sending
side of the pipe is still connected to a (re)started telegraf process,
and the 'cat' keeps hanging indefinitely, waiting for the pipe on
its stdin to close, which never happens.
The fix is trivial, just add the option -f to the daemon(8)
command on the rc.d script, forcing it to close stdout and stderr
before forking:
--- /usr/local/etc/rc.d/telegraf~ 2016-11-18 23:53:57.046298000 +0100
+++ /usr/local/etc/rc.d/telegraf 2016-11-18 23:54:17.675132000 +0100
@@ -31,3 +31,3 @@
command=/usr/sbin/daemon
-command_args="-crP ${pidfile} /usr/local/bin/${name} ${telegraf_flags}
-config=${telegraf_conf} 2>> /var/log/telegraf.log"
+command_args="-f -crP ${pidfile} /usr/local/bin/${name} ${telegraf_flags}
-config=${telegraf_conf} 2>> /var/log/telegraf.log"
(the patch is attached)
Considering that there are at least a dozen of ports using a daemon(8)
utility but forgetting to specify the -f option, I wonder if it would
not be wiser to change the daemon(8) utility to close by default the
stdin/stdout/stderr after disassociating from a controlling terminal
but before forking, as this is something that one expects as an
essential step in daemonizing a process. IMO this may be preferred
to filing a dozen of bug reports similar to this one for various other
ports, even at the expense of incompatibility.
--
You are receiving this mail because:
You are the assignee for the bug.
More information about the freebsd-ports-bugs
mailing list