Boot freeze 11.0p3 during network initialization
J.R. Oldroyd
fbsd at opal.com
Thu Feb 2 16:28:08 UTC 2017
I have filed a PR with this patch so that it doesn't get overlooked.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216731
-jr
On Thu, 26 Jan 2017 10:20:17 -0500 "J.R. Oldroyd" <fbsd at opal.com> wrote:
>
> Sorry for the time gap, I had to deal with family matters.
>
> OK, I patched if_lagg.c to drop and re-acquire the lock around
> the call to init the underlying driver. I've been running this
> for some weeks now and haven't seen the boot-hang since. Hopefully
> I have tested long enough.
>
> Someone more familiar with this driver and use of this lock there
> should review this patch and comment.
>
> -jr
>
>
> Index: sys/net/if_lagg.c
> ===================================================================
> --- sys/net/if_lagg.c (revision 307319)
> +++ sys/net/if_lagg.c (working copy)
> @@ -995,6 +995,21 @@
> LAGG_RUNLOCK(sc, &tracker);
> break;
>
> + case SIOCADDMULTI:
> + case SIOCDELMULTI:
> + /*
> + * Drivers like if_re.c cause a LOR on WLOCK, so we must
> + * drop and re-aquire the lock around the call.
> + */
> + if (lp->lp_ioctl == NULL) {
> + error = EINVAL;
> + break;
> + }
> + LAGG_WUNLOCK(sc);
> + error = (*lp->lp_ioctl)(ifp, cmd, data);
> + LAGG_WLOCK(sc);
> + break;
> +
> case SIOCSIFCAP:
> if (lp->lp_ioctl == NULL) {
> error = EINVAL;
>
>
> On Wed, 28 Dec 2016 00:24:09 -0800 Adrian Chadd <adrian.chadd at gmail.com> wrote:
> >
> > hi,
> >
> > yes, the LOR is why the boot hang occurs :(
> >
> >
> >
> > -a
> >
> >
> > On 27 December 2016 at 14:30, J.R. Oldroyd <fbsd at opal.com> wrote:
> > > Sorry, Adrian, I'm missing the back-story here and I'm not that
> > > familiar with the lagg code.
> > >
> > > Are you saying that this LOR is likely relevant to this boot hang,
> > > or are you saying that this is a known problem that's not relevant?
> > >
> > > Jan Kokemüller posted some lagg patches. I don't know if they are
> > > likely applicable to this problem, but I could try those.
> > >
> > > https://reviews.freebsd.org/D6845
> > > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211689#c4
> > >
> > > The first removes an RLOCK, but not the one referenced in the LOR
> > > report. The second is a patch for the ath/iwm panic. If you're
> > > unfamiliar with them, I will study up on this code and patches
> > > to get up to speed on it.
> > >
> > > -jr
> > >
> > >
> > > On Fri, 23 Dec 2016 11:41:33 -0800 Adrian Chadd <adrian.chadd at gmail.com> wrote:
> > >>
> > >> Right, that's the known lock order issue with lagg. :(
> > >>
> > >>
> > >> -adrian
> > >>
> > >>
> > >> On 23 December 2016 at 11:37, J.R. Oldroyd <fbsd at opal.com> wrote:
> > >> > On Fri, 23 Dec 2016 10:17:34 -0800 Adrian Chadd <adrian.chadd at gmail.com> wrote:
> > >> >>
> > >> >> On 20 December 2016 at 08:18, J.R. Oldroyd <fbsd at opal.com> wrote:
> > >> >> > On Thu, 8 Dec 2016 17:19:26 -0500 "J.R. Oldroyd" <fbsd at opal.com> wrote:
> > >> >> >>
> > >> >> >> On Thu, 08 Dec 2016 21:29:32 +0200 "Andriy Voskoboinyk" <s3erios at gmail.com> wrote:
> > >> >> >> >
> > >> >> >> > Thu, 08 Dec 2016 16:57:19 +0200 було написано J.R. Oldroyd <fbsd at opal.com>:
> > >> >> >> >
> > >> >> >> > Is there any additional output with
> > >> >> >> > wlandebug_wlan0="scan+state+auth+assoc"
> > >> >> >> > in /etc/rc.conf ?
> > >> >> >> >
> > >> >> >>
> > >> >> >> I have put that in and rebooted several times, all times OK.
> > >> >> >> I will report back again in due course when it next hangs.
> > >> >> >>
> > >> >> >> -jr
> > >> >> >>
> > >> >> >
> > >> >> > The boot hang occurred again today. I noted the point of the hang and
> > >> >> > rebooted; the log from the good boot with annotation of the previous hang
> > >> >> > point is here [1].
> > >> >> >
> > >> >> > -jr
> > >> >> >
> > >> >> > [1] http://opal.com/jr/freebsd/20161220-fbsd11.3-boot_hang_wlan_debug.txt
> > >> >> > _______________________________________________
> > >> >> > freebsd-wireless at freebsd.org mailing list
> > >> >> > https://lists.freebsd.org/mailman/listinfo/freebsd-wireless
> > >> >> > To unsubscribe, send any mail to "freebsd-wireless-unsubscribe at freebsd.org"
> > >> >>
> > >> >>
> > >> >> can you compile with witness and invariants? I'd like to see if its
> > >> >> locking related.
> > >> >>
> > >> >> thanks
> > >> >>
> > >> >>
> > >> >> -adrian
> > >> >>
> > >> >>
> > >> >
> > >> > Hmm, maybe:
> > >> >
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: ieee80211_swscan_add_scan: chan 11g min dwell met (2146895553 > 2146895553)
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_mindwell: called
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: loop start; scandone=0
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: chan 11g -> 7g [active, dwell min 20ms max 200ms]
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan: calling; maxdwell=200
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: waiting
> > >> > Dec 23 14:30:34 shibato kernel: re0: link state changed to UP
> > >> > Dec 23 14:30:34 shibato kernel: lagg0: link state changed to UP
> > >> > Dec 23 14:30:34 shibato kernel: lock order reversal:
> > >> > Dec 23 14:30:34 shibato kernel: 1st 0xfffff800095d2208 if_lagg rmlock (if_lagg rmlock) @ /usr/src/sys/modules/if_lagg/../../net/if_lagg.c:1530
> > >> > Dec 23 14:30:34 shibato kernel: 2nd 0xfffffe0000e10218 re0 (network driver) @ dev/re/if_re.c:3433
> > >> > Dec 23 14:30:34 shibato kernel: stack backtrace:
> > >> > Dec 23 14:30:34 shibato kernel: #0 0xffffffff80a98b60 at witness_debugger+0x70
> > >> > Dec 23 14:30:34 shibato kernel: #1 0xffffffff80a98a54 at witness_checkorder+0xe54
> > >> > Dec 23 14:30:34 shibato kernel: #2 0xffffffff80a1c794 at __mtx_lock_flags+0xa4
> > >> > Dec 23 14:30:34 shibato kernel: #3 0xffffffff8078c279 at re_ioctl+0x3a9
> > >> > Dec 23 14:30:34 shibato kernel: #4 0xffffffff8222428e at lagg_port_ioctl+0xde
> > >> > Dec 23 14:30:34 shibato kernel: #5 0xffffffff80b20bbf at if_addmulti+0x39f
> > >> > Dec 23 14:30:34 shibato kernel: #6 0xffffffff82224708 at lagg_ether_cmdmulti+0x158
> > >> > Dec 23 14:30:34 shibato kernel: #7 0xffffffff822219dd at lagg_ioctl+0xdd
> > >> > Dec 23 14:30:34 shibato kernel: #8 0xffffffff80b20bbf at if_addmulti+0x39f
> > >> > Dec 23 14:30:34 shibato kernel: #9 0xffffffff80c35a97 at in6_mc_join_locked+0x1d7
> > >> > Dec 23 14:30:34 shibato kernel: #10 0xffffffff80c35715 at in6_joingroup+0x75
> > >> > Dec 23 14:30:34 shibato kernel: #11 0xffffffff80c2f9e9 at in6_update_ifa+0x1339
> > >> > Dec 23 14:30:34 shibato kernel: #12 0xffffffff80c33eb3 at in6_ifattach+0x413
> > >> > Dec 23 14:30:34 shibato kernel: #13 0xffffffff80b1fd84 at ifioctl+0xfe4
> > >> > Dec 23 14:30:34 shibato kernel: #14 0xffffffff80a9d946 at kern_ioctl+0x246
> > >> > Dec 23 14:30:34 shibato kernel: #15 0xffffffff80a9d691 at sys_ioctl+0x171
> > >> > Dec 23 14:30:34 shibato kernel: #16 0xffffffff80e9d40b at amd64_syscall+0x2db
> > >> > Dec 23 14:30:34 shibato kernel: #17 0xffffffff80e7d8ab at Xfast_syscall+0xfb
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: loop start; scandone=0
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: chan 7g -> 36a [active, dwell min 20ms max 200ms]
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan: calling; maxdwell=200
> > >> > Dec 23 14:30:34 shibato kernel: wlan0: scan_curchan_task: waiting
> > >> >
> > >> > This boot then continued normally, no hang.
> > >> >
> > >> > -jr
> > >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freebsd.org/pipermail/freebsd-wireless/attachments/20170202/3827ea0f/attachment.sig>
More information about the freebsd-wireless
mailing list