pkg with an ssh repo crashes CURRENT
Konstantin Belousov
kostikbel at gmail.com
Thu Aug 20 22:18:59 UTC 2015
On Thu, Aug 20, 2015 at 03:26:10PM -0500, Mark Felder wrote:
>
>
> On Thu, Aug 20, 2015, at 06:50, Konstantin Belousov wrote:
> > On Wed, Aug 19, 2015 at 04:52:56PM -0500, Mark Felder wrote:
> > > panic: children list
> > > cpuid = 0
> > > KDB: stack backtrace:
> > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > > 0xfffffe01228ea840
> > > vpanic() at vpanic+0x189/frame 0xfffffe01228ea8c0
> > > kassert_panic() at kassert_panic+0x132/frame 0xfffffe01228ea930
> > > kern_procctl_single() at kern_procctl_single+0x81c/frame
> > > 0xfffffe01228eaa00
> > > kern_procctl() at kern_procctl+0x223/frame 0xfffffe01228eaa50
> > > sys_procctl() at sys_procctl+0xa5/frame 0xfffffe01228eaae0
> > > amd64_syscall() at amd64_syscall+0x282/frame 0xfffffe01228eabf0
> > > Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe01228eabf0
> >
> > The fired assert means that there was a reaper process with some children
> > but without descendands to be reaped. Hm, I can imagine this situation
> > to happen if e.g. some not-reaper forks and then acquires reaper status.
> > The patch below removes too aggressive asserts.
> >
> > Still, it would be interesting to look into the process table. Please
> > repeat the procedure to panic, then in ddb do 'ps'. After that do
> > 'dump' and please keep kernel.debug and vmcore around. First I want to
> > look
> > at the ps output.
>
> I've recreated this in a bhyve VM with the latest CURRENT snapshot,
> r286893. You can grab the whole /var/crash dump at
> https://feld.me/freebsd/crash.tar.gz
vmcore is useless without matching kernel.debug.
>
> I've pasted the ps output below, but it's also included in the info.0
> file.
And this is not very useful without the preceeding panic message and other
bits from the panic handler.
I guess the process 666 was current when the panic occured ?
Basically, what I want is to see the p_reaper value for the process
with the pid 667. Even just p_reaper->p_pid is enough.
>
> Stopped at kdb_enter+0x3e: movq $0,kdb_why
> db> ps
> pid ppid pgrp uid state wmesg wchan cmd
> 667 666 665 0 S+ select 0xfffff80003c53840 ssh
> 666 665 665 0 R+ CPU 0 pkg
> 665 629 665 0 S+ wait 0xfffff800039e0548 pkg
> 629 628 629 0 S+ pause 0xfffff8001947eb38 csh
> 628 1 628 0 Ss+ wait 0xfffff80003db8a90 login
> 627 1 627 0 Ss+ ttyin 0xfffff80003c0f0a8 getty
> 626 1 626 0 Ss+ ttyin 0xfffff80003c0f4a8 getty
> 625 1 625 0 Ss+ ttyin 0xfffff8000387a0a8 getty
> 624 1 624 0 Ss+ ttyin 0xfffff8000387a4a8 getty
> 623 1 623 0 Ss+ ttyin 0xfffff8000387a8a8 getty
> 622 1 622 0 Ss+ ttyin 0xfffff8000387aca8 getty
> 621 1 621 0 Ss+ ttyin 0xfffff8000387b0a8 getty
> 620 1 620 0 Ss+ ttyin 0xfffff8000387b4a8 getty
> 577 1 577 0 Ss nanslp 0xffffffff81ab2561 cron
> 573 1 573 25 Ss pause 0xfffff80003d040a8 sendmail
> 570 1 570 0 Ss select 0xfffff80003849c40 sendmail
> 542 1 542 0 Ss select 0xfffff80003c53ec0 sshd
> 443 1 443 0 Ss select 0xfffff80003849d40 casperd
> 442 1 442 0 Ss select 0xfffff80003c540c0 casperd
> 342 1 342 0 Ss select 0xfffff80003849dc0 syslogd
> 271 1 271 0 Ss select 0xfffff80003849ec0 devd
> 16 0 0 0 DL vlruwt 0xfffff800039e0a90 [vnlru]
> 15 0 0 0 DL syncer 0xffffffff81c41cf8 [syncer]
> 14 0 0 0 DL (threaded) [bufdaemon]
> 100042 D psleep 0xffffffff81c40f04 [bufdaemon]
> 100057 D sdflush 0xfffff80003d870e8 [/ worker]
> 9 0 0 0 DL pgzero 0xffffffff81c4aee4 [pagezero]
> 8 0 0 0 DL psleep 0xffffffff81c4a6b8 [vmdaemon]
> 7 0 0 0 DL (threaded)
> [pagedaemon]
> 100039 D psleep 0xffffffff81cf6684
> [pagedaemon]
> 100045 D umarcl 0xffffffff81c4a040 [uma]
> 6 0 0 0 DL waiting_ 0xffffffff81ce8640
> [sctp_iterator]
> 5 0 0 0 DL (threaded) [cam]
> 100017 D - 0xffffffff818d6e00 [doneq0]
> 100038 D - 0xffffffff818d6c48 [scanner]
> 4 0 0 0 DL crypto_r 0xffffffff81c48b88 [crypto
> returns]
> 3 0 0 0 DL crypto_w 0xffffffff81c48a30 [crypto]
> 13 0 0 0 DL (threaded) [geom]
> 100010 D - 0xffffffff81cc0aa0 [g_event]
> 100011 D - 0xffffffff81cc0aa8 [g_up]
> 100012 D - 0xffffffff81cc0ab0 [g_down]
> 12 0 0 0 WL (threaded) [intr]
> 100006 I [swi4:
> clock (0)]
> 100007 I [swi4:
> clock (1)]
> 100008 I [swi3: vm]
> 100009 I [swi1:
> netisr 0]
> 100018 I [swi6: task
> queue]
> 100019 I [swi6:
> Giant taskq]
> 100021 I [swi5: fast
> taskq]
> 100026 I [irq264:
> virtio_pci0]
> 100027 I [irq265:
> virtio_pci0]
> 100028 I [irq266:
> virtio_pci0]
> 100031 I [irq267:
> virtio_pci1]
> 100032 I [irq268:
> virtio_pci1]
> 100033 I [swi0: uart
> uart]
> 100034 I [irq1:
> atkbd0]
> 11 0 0 0 RL (threaded) [idle]
> 100004 CanRun [idle:
> cpu0]
> 100005 Run CPU 1 [idle:
> cpu1]
> 2 0 0 0 DL - 0xffffffff81a03ca0
> [rand_harvestq]
> 1 0 1 0 SLs wait 0xfffff8000362f548 [init]
> 10 0 0 0 DL audit_wo 0xffffffff81cedc10 [audit]
> 0 0 0 0 DLs (threaded) [kernel]
> 100000 D swapin 0xffffffff81cc0ad8 [swapper]
> 100013 D - 0xfffff80003611300 [firmware
> taskq]
> 100016 D - 0xfffff80003610e00 [ffs_trim
> taskq]
> 100020 D - 0xfffff80003610400 [thread
> taskq]
> 100022 D - 0xfffff80003820100
> [acpi_task_0]
> 100023 D - 0xfffff80003820100
> [acpi_task_1]
> 100024 D - 0xfffff80003820100
> [acpi_task_2]
> 100025 D - 0xfffff8000381fc00 [kqueue
> taskq]
> 100029 D - 0xfffff8000381f200 [vtnet0 rxq
> 0]
> 100030 D - 0xfffff8000381f100 [vtnet0 txq
> 0]
> 100035 D - 0xffffffff81ab1330 [deadlkres]
> 100037 D - 0xfffff80003610c00 [CAM taskq]
>
>
>
>
> >
> > diff --git a/sys/kern/kern_procctl.c b/sys/kern/kern_procctl.c
> > index d65ba5a..8ef72901 100644
> > --- a/sys/kern/kern_procctl.c
> > +++ b/sys/kern/kern_procctl.c
> > @@ -187,8 +187,6 @@ reap_status(struct thread *td, struct proc *p,
> > }
> > } else {
> > rs->rs_pid = -1;
> > - KASSERT(LIST_EMPTY(&reap->p_reaplist), ("reap children
> > list"));
> > - KASSERT(LIST_EMPTY(&reap->p_children), ("children
> > list"));
> > }
> > return (0);
> > }
>
> I'll try compiling a kernel with your patch and see what happens.
More information about the freebsd-current
mailing list