r316958: booting a server takes >10 minutes!

Ian Lepore ian at freebsd.org
Mon Apr 17 15:29:46 UTC 2017


On Sun, 2017-04-16 at 21:53 -0700, Maxim Sobolev wrote:
> Well, all this suggests to me that there must be some issue with the client
> syslog code in the libc, so that if syslog daemon hangs or has some
> internal issue that would basically render system mostly unusable. I think
> that might be an interesting project for somebody who has some spare time
> on hands to take syslogd as of (r317033 - 1) and see what can be done to
> improve resilience of the system against such a failure mode.
> 
> -Max
> 

On the sending side, the libc code tries very hard to deliver messages
to the unpriveleged /var/run/log socket; if the datagram send fails due
to buffer space (i.e., due to syslogd not keeping up on the read side),
it will endlessly loop to sleep for 1us then try again until it
succeeds.

On the other hand, for /var/run/logpriv apparently the theory is that
hanging a process with enough privs to use that connection would be
bad.  So it retries just once for errors that are not related to buffer
space, and doesn't retry at all if the error was buffer space (which is
a case of the code not quite matching the nearby comments) then gives
up on syslogd and writes the message directly to the console before
returning.

So yeah, there may be some room for improvement in that logic. :)  I
think it could eventually give up in the non-priv case and maybe try an
extra time or two in the priveleged case.

When we ran into this at $work years ago we just wrote our own work-
alike function to use instead of syslog(3); it retries any kind of
failure no more than 3 times, with a millisecond sleep between each
try.  (Losing logging is bad, but losing the functionality of our app
that's trying to do the logging is even worse.)

-- Ian

> On Sun, Apr 16, 2017 at 5:50 PM, Ben Woods <woodsb02 at gmail.com>
> wrote:
> 
> > 
> > On 16 April 2017 at 03:24, Larry Rosenman <ler at lerctr.org> wrote:
> > 
> > > 
> > > Current SVN seems to have fixed it (via sobomax@ syslogd commit).
> > > 
> > 
> > I experienced this issue too, and can confirm that it existing on
> > r316952,
> > but is resolve on r317033.
> > 
> > It was extremely strange. The symptoms I was experiencing were:
> > - lightdm display manager would fail to start
> > - slim display manager would start, but then fail to login to xfce
> > - "service hald restart" and "service dbus restart" would fail
> > - "pkg upgrade hal" would fail
> > 
> > Regards,
> > Ben
> > 
> > --
> > From: Benjamin Woods
> > woodsb02 at gmail.com
> > _______________________________________________
> > freebsd-current at freebsd.org mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-current
> > To unsubscribe, send any mail to "freebsd-current-unsubscribe at freeb
> > sd.org"
> > 
> > 
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd
> .org"


More information about the freebsd-current mailing list