stopping amd causes a freeze

Sun Jul 28 09:00:42 UTC 2013

> On 28/07/2013 08:24, Konstantin Belousov wrote:
> > On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote:
> >> On 26/07/2013 19:10, Dominic Fandrey wrote:
> >>> On 25/07/2013 12:00, Konstantin Belousov wrote:
> >>>> On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
> >>>>> On 22/07/2013 12:07, Konstantin Belousov wrote:
> >>>>>> On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
> >>>>>>> ...
> >>>>>>>
> >>>>>>> I run amd through sysutils/automounter, which is a scripting solution
> >>>>>>> that generates an amd.map file based on encountered devices and devd
> >>>>>>> events. The SIGHUP it sends to amd to tell it the map file was updated
> >>>>>>> does not cause problems, only a -SIGKILL- SIGTERM may cause the freeze.
> >>>>>>>
> >>>>>>> Nothing was mounted (by amd) during the last freeze.
> >>>>>>>
> >>>>>>> ...
> >>>>>>
> >>>>>> Are you sure that the machine did not paniced ?  Do you have serial console ?
> >>>>>>
> >>>>>> The amd(8) locks itself into memory, most likely due to the fear of
> >>>>>> deadlock. There are some known issues with user wirings in stable/9.
> >>>>>> If the problem you see is indeed due to wiring, you might try to apply
> >>>>>> r253187-r253191.
> >>>>>
> >>>>> I tried that. Applying the diff was straightforward enough. But the
> >>>>> resulting kernel paniced as soon as it tried to mount the root fs.
> >>>> You did provided a useful info to diagnose the issue.
> >>>>
> >>>> Patch should keep KBI compatible, but, just in case, if you have any
> >>>> third-party module, rebuild it.
> >>>>
> >>>>>
> >>>>> So I'll wait for the MFC from someone who knows what he/she is doing.
> >>>>
> >>>> Patch below booted for me, and I run some sanity check tests for the
> >>>> mlockall(2), which also did not resulted in misbehaviour.
> >>>>
> >>>
> >>> Your patch applied cleanly and the system booted with the resulting
> >>> kernel.
> >>>
> >>> Amd exhibits several very strange behaviours. ...
> >>
> >> I can verify the whole thing with a clean world and kernel.
> >>
> >> This time I'll concentrate on the first instance of amd:
> >>
> >> # tail -n3 /var/log/messages
> >> Jul 27 10:08:56 mobileKamikaze kernel: newnfs server pid5868 at mobileKamikaze:/var/run/automounter.amd.mnt: not responding
> >> Jul 27 10:09:41 mobileKamikaze kernel: newnfs server pid5868 at mobileKamikaze:/var/run/automounter.amd.mnt: not responding
> >> Jul 27 10:11:41 mobileKamikaze last message repeated 3 times
> >>
> >> The process, it turns out, simply doesn't exist. There is another
> >> process, though:
> >> # ps auxww | grep -F sbin/amd
> >> root       5869   0.0  0.1  12036   8020 ??  S    10:08am   0:00.01 /usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 /var/run/automounter.amd.mnt /var/run/automounter.amd.map
> >>
> >> # cat /var/run/automounter.amd.pid
> >> 5868
> >>
> >> Here is what I think happens, amd forks a subprocess and the main
> >> process, silently dies after it wrote its pidfile.
> > Nothing dies silently.  Either process was killed by signal, or it
> > exited with the explicit call to exit(2).  In the first case, default
> > kernel settings of kern.logsigexit should make a record in the syslog.
> > The machdep.uprintf_signal might be also useful, but not for daemons.
> 
> Well, after I reverted your patch I got some things in the syslog.
> Sometimes amd works as expected, sometimes it dies right after starting:
> Jul 28 10:19:42 mobileKamikaze kernel: pid 24217 (amd), uid 0: exited on signal 11 (core dumped)
> 
> This is just all over confusing.

just to confuse you a bit more :-)
I gave up with mlockall(2) so I compiled amd statically linked.

my 5 cents.

danny