panic in rt_check_fib()
Giorgos Keramidas
keramida at freebsd.org
Sun Sep 14 12:56:53 UTC 2008
On Sat, 13 Sep 2008 23:28:51 -0700, Julian Elischer <julian at elischer.org> wrote:
> To recap on this, I rewrote this function a couple of week sagobecause I
> couldn't keep track of what was going on, and I thought it might
> havesome bad edge cases. a couple of days later Giorgos contacted me
> saying hta the had a fairly reproducible situation
> where this was triggered and it appeared to be an edge case in
> this function that allowed it to try lock the same lock twice.
>
> I immediatly thought "ah=hah!" I may have a solution to this,
> and gave him a copy of my new function and indead it DOES fix that
> panic. however after deleting and recreating intefaces a few hundred
> times without crashing in rt_check_fib() it then fails somewhere else,
> (actually it leacks some resources and eventually networking stops).
>
> I'm not convinced that is a problem with the new or old rt_check() but
> it did stop me from just committing the new code.
>
> I rereading the way the function (did and still does) work it
> occurred to me that there was a large flaw in teh way it worked..
>
> It dropped a the lock on one route while it went off an did something
> else that might block, On returning it blindly re-grabbed that lock,
> completely ignoring the fact that the route might not even be valid any
> more. (or any of several other things that may have changed while
> it was away (maybe sleeping)).
>
> the code Giorgos is referring to is a patch I suggested to him to
> fix this oversight and not the one that I originally tested and
> had suggested to fix the edge case.
>
> I do however ask that some other people look at this patch!
Exactly. Thanks for summarizing this so well :)
I have started a kernel with your latest patch (from the quoted message
above), and I can't panic my kernel with the script that did it in a
semi-reliable manner before:
% root at kobe:/root# while true ; do \
% sh home.sh > /dev/null 2>&1 ; \
% vmstat -z | sed -n -e 1p -e /rt/p ; \
% sleep 1 ; \
% done
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 19, 77, 43, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 20, 76, 47, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 21, 75, 51, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 23, 73, 55, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 24, 72, 59, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 25, 71, 62, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 26, 70, 65, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 27, 69, 69, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 29, 67, 73, 0
% ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
% rtentry: 120, 0, 30, 66, 76, 0
% ^C
% root at kobe:/root# sh home.sh
rtentries seem to be going up every time I cycle through the script,
which essentially brings down both wireless and wired interfaces and
then brings up the wired interface of my laptop. The core of the script
is currently:
# network interface options
export ifconfig_re0="inet 192.168.1.10/24"
export defaultrouter='192.168.1.1'
echo '## Stopping network interfaces.'
/etc/rc.d/netif stop re0 && ifconfig re0 delete
/etc/rc.d/netif stop iwn0 && ifconfig iwn0 delete
echo '## Bringing up network interface.'
/etc/rc.d/netif start re0
echo "## Reloading firewall rules."
/etc/rc.d/pf reload
# The default route may be pointing to another interface. Find out
# the IP address of the default gateway, delete it and point to the
# default gateway configured as ${defaultrouter}.
if [ -n "${defaultrouter}" ]; then
echo '## Setting default router.'
_oldrouter=`netstat -rn | grep default | awk '{print $2}'`
if [ -n "${_oldrouter}" ]; then
route delete default "${_oldrouter}"
unset _oldrouter
fi
route add default "$defaultrouter"
fi
With your version of rt_check_fib() I have no panics so far. This
doesn't mean we don't have a bug elsewhere, or that it will not panic
tomorrow, but it's nice that thing seem a bit more stable now. The old
version of rt_check_fib() used to panic about one third of the time I
ran my 'home.sh' script...
Now an interesting question is: Is it `normal' that the USED rtentry
objects keep going up at every interface restart and are (at least at
first glance) not reclaimed as fast as they are acquired?
More information about the freebsd-current
mailing list