svn commit: r302350 - in head: share/man/man9 sys/kern sys/sys

Wed Jul 13 09:52:11 UTC 2016

On 07/13/16 10:49, Roger Pau Monné wrote:
> On Tue, Jul 05, 2016 at 06:47:18PM +0000, Gleb Smirnoff wrote:
>> Author: glebius
>> Date: Tue Jul  5 18:47:17 2016
>> New Revision: 302350
>> URL: https://svnweb.freebsd.org/changeset/base/302350
>>
>> Log:
>>   The paradigm of a callout is that it has three consequent states:
>>   not scheduled -> scheduled -> running -> not scheduled. The API and the
>>   manual page assume that, some comments in the code assume that, and looks
>>   like some contributors to the code also did. The problem is that this
>>   paradigm isn't true. A callout can be scheduled and running at the same
>>   time, which makes API description ambigouous. In such case callout_stop()
>>   family of functions/macros should return 1 and 0 at the same time, since it
>>   successfully unscheduled future callout but the current one is running.
>>   Before this change we returned 1 in such a case, with an exception that
>>   if running callout was migrating we returned 0, unless CS_MIGRBLOCK was
>>   specified.
>>
>>   With this change, we now return 0 in case if future callout was unscheduled,
>>   but another one is still in action, indicating to API users that resources
>>   are not yet safe to be freed.
>>
>>   However, the sleepqueue code relies on getting 1 return code in that case,
>>   and there already was CS_MIGRBLOCK flag, that covered one of the edge cases.
>>   In the new return path we will also use this flag, to keep sleepqueue safe.
>>
>>   Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to
>>   migration edge case, rename it to CS_EXECUTING.
>>
>>   This change fixes panics on a high loaded TCP server.
>>
>>   Reviewed by:	jch, hselasky, rrs, kib
>>   Approved by:	re (gjb)
>>   Differential Revision:	https://reviews.freebsd.org/D7042
>
> This change triggers a KASSERT when resuming a Xen VM from suspension:
>
> panic: bogus refcnt 0 on lle 0xfffff800035b9c00
> cpuid = 0
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe001d2fa800
> vpanic() at vpanic+0x182/frame 0xfffffe001d2fa880
> kassert_panic() at kassert_panic+0x126/frame 0xfffffe001d2fa8f0
> llentry_free() at llentry_free+0x136/frame 0xfffffe001d2fa920
> in_lltable_free_entry() at in_lltable_free_entry+0xb0/frame 0xfffffe001d2fa950
> arp_ifinit() at arp_ifinit+0x54/frame 0xfffffe001d2fa980
> netfront_backend_changed() at netfront_backend_changed+0x154/frame 0xfffffe001d2faa50
> xenwatch_thread() at xenwatch_thread+0x1a2/frame 0xfffffe001d2faa70
> fork_exit() at fork_exit+0x84/frame 0xfffffe001d2faab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe001d2faab0
> --- trap 0, rip = 0, rsp = 0, rbp = 0 ---
>
> This seems to be caused by the following snipped in xen-netfront, which is
> used by netfront in order to send a gratuitous ARP after recovering from
> migration, in order for the bridge in the host to cache the MAC of the
> domain.
>
> TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) {
> 	if (ifa->ifa_addr->sa_family == AF_INET) {
> 		arp_ifinit(ifp, ifa);
> 	}
> }
>
> FWIW, this was working fine before this change, so I'm afraid this change
> triggered some kind of bug in the lle entries.
>
> Roger.
>

Hi,

> static void
> nd6_llinfo_settimer_locked(struct llentry *ln, long tick)
> {
>         int canceled;
>
>         LLE_WLOCK_ASSERT(ln);
>
>         if (tick < 0) {
>                 ln->la_expire = 0;
>                 ln->ln_ntick = 0;
>                 canceled = callout_stop(&ln->lle_timer);
>         } else {
>                 ln->la_expire = time_uptime + tick / hz;
>                 LLE_ADDREF(ln);
>                 if (tick > INT_MAX) {
>                         ln->ln_ntick = tick - INT_MAX;
>                         canceled = callout_reset(&ln->lle_timer, INT_MAX,
>                             nd6_llinfo_timer, ln);
>                 } else {
>                         ln->ln_ntick = 0;
>                         canceled = callout_reset(&ln->lle_timer, tick,
>                             nd6_llinfo_timer, ln);
>                 }
>         }
>         if (canceled > 0)
>                 LLE_REMREF(ln);
> }

Like stated in D7042, there are dependencies in the code that expects 
callout_reset() to work like this:

int
callout_reset()
{
	atomic_lock();
	retval = callout_stop();
	restart callout();
	atomic_unlock();

	return (retval);
}

As you can see in the piece of code above. r302350 fundamentally changes 
that.

D7042 has many open questions that are not answered and I think it 
should be reverted from head and 11-stable until the discussions are 
complete.

I believe the problems found should be fixed in the TCP stack, using 
locked callouts instead of unlocked.

--HPS