svn commit: r315136 - head/sys/netpfil/pf

Wed Mar 15 01:26:49 UTC 2017

On 15 Mar 2017, at 6:57, Gleb Smirnoff wrote:
> On Sun, Mar 12, 2017 at 05:42:57AM +0000, Kristof Provost wrote:
> K> Log:
> K>   pf: Fix incorrect rw_sleep() in pf_unload()
> K>
> K>   When we unload we don't hold the pf_rules_lock, so we cannot call 
> rw_sleep()
> K>   with it, because it would release a lock we do not hold. There's 
> no need for the
> K>   lock either, so we can just tsleep().
> K>
> K>   While here also make the same change in pf_purge_thread(), 
> because it explicitly
> K>   takes the lock before rw_sleep() and then immediately releases it 
> afterwards.
>
> The correct change would to be grab lock in pf_unload(), exactly as 
> pf_purge_thread()
> does. With your change you introduces a possible infinite sleep due to 
> race, since
> there is no timeout and no lock.
>
I must be missing something, because I don’t see the race, and don’t 
see how we
could end up with an infinite sleep.

Even if pf_purge_thread() somehow misses that pf_end_threads is non-zero 
(say
it was not sleeping but executing the last vnet cleanup after the
pf_end_threads check) while pf_unload() calls the 
wakeup_one(pf_purge_thread)
it would still terminate the next time the tsleep() timed out.

The only way around that would be to hold the PF_RULES_LOCK() during all 
of
pf_purge_thread() that’s not actually sleeping. That’s non-trivial 
because
pf_purge_unlinked_rules() also takes the PF_RULES_LOCK and there are 
lock
ordering constraints with some of the other locks taken in the other 
cleanup
code.

Given that unloading pf is a non-supported development use case (see
MOD_QUIESCE in pf_modevent()), I think I’d rather accept the 
occasional 10
second delay in unloading.

There is another issue with unloading (vnet_pf_uninit() gets called 
after
pf_unload(), which means we try to take a destroyed lock), for which 
I’ll have
a patch soon.

Regards,
Kristof