vlan trunk structure race condition between vlan unconfig and vlan_input

Bharavi Oak bharavi_oak at yahoo.com
Thu Feb 14 21:55:53 UTC 2013


Hi,

In vlan_unconfig_locked, vlan trunk is destroyed via trunk_destroy function which frees hash, destroys trunk lock and then frees trunk. Before trunk_destroy is called, trunk lock is released so that a thread in vlan_input, if any, can finish its work (as per the comment in the code). However, we have encountered a condition wherein this release of lock was not enough for the other thread to grab the lock and finish its work. Instead, what happened was that this other thread (doing vlan_input) got the lock when this thread (destroying vlan) had already freed hash inside trunk_destroy. As a result of this, the vlan_input thread got NULL hash (in vlan_gethash) causing panic.
In other words, execution of a thread in vlan_input could reach TRUNK_RLOCK when another thread doing vlan unconfig is at any point in its execution; it can be before trunk_destroy or at any point within trunk_destroy. In our case, it caused trunk->hash to be null while trunk was still not null. But it may as well happen that the trunk itself has been freed when the first thread reaches TRUNK_RLOCK. Any way, both cases would cause panic.

When trying to solve this, we have started working on the following lines:
Bring trunk lock outside of ifvlantrunk structure keeping intact the pointers to it in ifvlan and ifnet structures. So, the reference to trunk lock would not depend on validity of trunk.
In trunk destroy, delay freeing hash as much as possible, perhaps by checking if any read waiters are present. However, this would also require any thread doing vlan_input to reach TRUNK_RLOCK latest by this instance.
We may also check for validity of hash inside vlan_gethash; but this would add some extra instructions that are not required otherwise.

So, can a FreeBSD developer please comment on this problem and the possible way to solve this as mentioned above.

Thanks,
Bharavi

Note that although we encountered this in FreeBSD 7.x, there is no difference in this part of code in FreeBSD 9.x as well.



More information about the freebsd-net mailing list