Software interrupt preemption problems

From: Aurélien CAZUC <aurelien.cazuc_at_stormshield.eu>
Date: Tue, 28 Sep 2021 13:17:04 UTC
Hi, 

We have encountered a problem with the netisr system 

Context: 

The used scheduler is ULE 

We are using the netisr to poll packets on network cards: at each swi tick, 
we poll a given number of packets from the card (16 for the moment) and 
process them 

We run a network trafic of 400k packets per second through the machine and 
the network card does have 4096 packets in the buffer (around 10 ms of traffic) 

the sysctl kern.hz is set to 8000 
the sysctl kern.sched.preemption is set to 1 
the PREEMPTION option is enabled 


Problem: 

When the netisr doesn't have any work to do, the idle thread wakes up and takes 
the hand, allowing any userland process to execute 

Then, a userland process takes the hand, but because of the priority lending 
mechanism, the process might have taken the netisr priority when it was running 
on another CPU while the netisr was running (i.e.: the userland process might 
send some packets causing a concurrent lock access with the netisr) 

The problem appears on the next hz tick: the netisr asks for a schedule but 
because the current process "stole" the priority, the netisr can't preempt 
the process, and because the preemption isn't reevaluated until the userland 
process does a sysctl or the scheduler time slice has ended, the userland 
might keep the hand for a long time (a few tens of thousands of milliseconds 
until the time slice end), causing buffer exhaustion on the network card 

We added some debug with KTR and saw that the userland process loses his priority 
escalation after a while (a few hz ticks), making the netisr process the highest 
priority of the run queue, but because the scheduler doesn't reevaluate the 
preemption, the netisr doesn't preempt the userland process 


Fix proposal: 

We patched kern_intr to force the scheduler to reevalute the preemption when the 
swi should have asked for a schedule but is already in the run queue 

static int 
intr_event_schedule_thread(struct intr_event *ie, struct trapframe *frame) 
{ 
struct intr_entropy entropy; 
struct intr_thread *it; 
struct thread *td; 
struct thread *ctd; 

/* 
* If no ithread or no handlers, then we have a stray interrupt. 
*/ 
if (ie == NULL || CK_SLIST_EMPTY(&ie->ie_handlers) || 
ie->ie_thread == NULL) 
return (EINVAL); 

ctd = curthread; 
it = ie->ie_thread; 
td = it->it_thread; 

/* 
* If any of the handlers for this ithread claim to be good 
* sources of entropy, then gather some. 
*/ 
if (ie->ie_hflags & IH_ENTROPY) { 
entropy.event = (uintptr_t)ie; 
entropy.td = ctd; 
random_harvest_queue(&entropy, sizeof(entropy), RANDOM_INTERRUPT); 
} 

KASSERT(td->td_proc != NULL, ("ithread %s has no process", ie->ie_name)); 

/* 
* Set it_need to tell the thread to keep running if it is already 
* running. Then, lock the thread and see if we actually need to 
* put it on the runqueue. 
* 
* Use store_rel to arrange that the store to ih_need in 
* swi_sched() is before the store to it_need and prepare for 
* transfer of this order to loads in the ithread. 
*/ 
atomic_store_rel_int(&it->it_need, 1); 
thread_lock(td); 
if (TD_AWAITING_INTR(td)) { 
#ifdef HWPMC_HOOKS 
it->it_waiting = 0; 
if (PMC_HOOK_INSTALLED_ANY()) 
PMC_SOFT_CALL_INTR_HLPR(schedule, frame); 
#endif 
CTR3(KTR_INTR, "%s: schedule pid %d (%s)", __func__, td->td_proc->p_pid, 
td->td_name); 
TD_CLR_IWAIT(td); 
sched_add(td, SRQ_INTR); 
} else { 
#ifdef HWPMC_HOOKS 
it->it_waiting++; 
if (PMC_HOOK_INSTALLED_ANY() && 
(it->it_waiting >= intr_hwpmc_waiting_report_threshold)) 
PMC_SOFT_CALL_INTR_HLPR(waiting, frame); 
#endif 
CTR5(KTR_INTR, "%s: pid %d (%s): it_need %d, state %d", 
__func__, td->td_proc->p_pid, td->td_name, it->it_need, TD_GET_STATE(td)); 
+ thread_lock(ctd); 
+ sched_setpreempt(td); 
+ thread_unlock(ctd); 
thread_unlock(td); 
} 

return (0); 
} 

We would like to know if this patch does look correct to you and if so, if 
we should make a PR 

Thanks