Software interrupt preemption problems
- Reply: Ryan Stone : "Re: Software interrupt preemption problems"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 28 Sep 2021 13:17:04 UTC
Hi,
We have encountered a problem with the netisr system
Context:
The used scheduler is ULE
We are using the netisr to poll packets on network cards: at each swi tick,
we poll a given number of packets from the card (16 for the moment) and
process them
We run a network trafic of 400k packets per second through the machine and
the network card does have 4096 packets in the buffer (around 10 ms of traffic)
the sysctl kern.hz is set to 8000
the sysctl kern.sched.preemption is set to 1
the PREEMPTION option is enabled
Problem:
When the netisr doesn't have any work to do, the idle thread wakes up and takes
the hand, allowing any userland process to execute
Then, a userland process takes the hand, but because of the priority lending
mechanism, the process might have taken the netisr priority when it was running
on another CPU while the netisr was running (i.e.: the userland process might
send some packets causing a concurrent lock access with the netisr)
The problem appears on the next hz tick: the netisr asks for a schedule but
because the current process "stole" the priority, the netisr can't preempt
the process, and because the preemption isn't reevaluated until the userland
process does a sysctl or the scheduler time slice has ended, the userland
might keep the hand for a long time (a few tens of thousands of milliseconds
until the time slice end), causing buffer exhaustion on the network card
We added some debug with KTR and saw that the userland process loses his priority
escalation after a while (a few hz ticks), making the netisr process the highest
priority of the run queue, but because the scheduler doesn't reevaluate the
preemption, the netisr doesn't preempt the userland process
Fix proposal:
We patched kern_intr to force the scheduler to reevalute the preemption when the
swi should have asked for a schedule but is already in the run queue
static int
intr_event_schedule_thread(struct intr_event *ie, struct trapframe *frame)
{
struct intr_entropy entropy;
struct intr_thread *it;
struct thread *td;
struct thread *ctd;
/*
* If no ithread or no handlers, then we have a stray interrupt.
*/
if (ie == NULL || CK_SLIST_EMPTY(&ie->ie_handlers) ||
ie->ie_thread == NULL)
return (EINVAL);
ctd = curthread;
it = ie->ie_thread;
td = it->it_thread;
/*
* If any of the handlers for this ithread claim to be good
* sources of entropy, then gather some.
*/
if (ie->ie_hflags & IH_ENTROPY) {
entropy.event = (uintptr_t)ie;
entropy.td = ctd;
random_harvest_queue(&entropy, sizeof(entropy), RANDOM_INTERRUPT);
}
KASSERT(td->td_proc != NULL, ("ithread %s has no process", ie->ie_name));
/*
* Set it_need to tell the thread to keep running if it is already
* running. Then, lock the thread and see if we actually need to
* put it on the runqueue.
*
* Use store_rel to arrange that the store to ih_need in
* swi_sched() is before the store to it_need and prepare for
* transfer of this order to loads in the ithread.
*/
atomic_store_rel_int(&it->it_need, 1);
thread_lock(td);
if (TD_AWAITING_INTR(td)) {
#ifdef HWPMC_HOOKS
it->it_waiting = 0;
if (PMC_HOOK_INSTALLED_ANY())
PMC_SOFT_CALL_INTR_HLPR(schedule, frame);
#endif
CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, td->td_proc->p_pid,
td->td_name);
TD_CLR_IWAIT(td);
sched_add(td, SRQ_INTR);
} else {
#ifdef HWPMC_HOOKS
it->it_waiting++;
if (PMC_HOOK_INSTALLED_ANY() &&
(it->it_waiting >= intr_hwpmc_waiting_report_threshold))
PMC_SOFT_CALL_INTR_HLPR(waiting, frame);
#endif
CTR5(KTR_INTR, "%s: pid %d (%s): it_need %d, state %d",
__func__, td->td_proc->p_pid, td->td_name, it->it_need, TD_GET_STATE(td));
+ thread_lock(ctd);
+ sched_setpreempt(td);
+ thread_unlock(ctd);
thread_unlock(td);
}
return (0);
}
We would like to know if this patch does look correct to you and if so, if
we should make a PR
Thanks