Stack panic with em driver unload

Mon Apr 9 19:52:01 UTC 2007

On 4/6/07, Tai-hwa Liang <avatar at mmlab.cse.yzu.edu.tw> wrote:
> On Thu, 5 Apr 2007, Jack Vogel wrote:
> > Our test group uses a script that does 100 iterations of
> > a module load, then bring up all interfaces, and then
> > unload driver.
> >
> > Depending on the system in anything from just a few
> > iterations to 20 or more, the system will panic.
>
>    Just a "me too" here. :p
>
> > Its doing an em_detach() which calls ether_ifdetach()
> > which goes to if_detach, in_delmulti_ifp, in_delmulti_locked,
> > and finally if_delmulti().
> >
> > The panic is always happening on a cmpxchgq instruction
> > so I assume its the LOCK macro, whats odd is that its
> > not always the same reason, sometimes one register is
> > 0 so its a page fault trap, but on other iterations its a
> > general protection fault because the register is some
> > big invalid number :)
>
>    I run into this panic regularly.  Apparently the result and condition
> to trigger the panic are the same as yours: running "while true; do
> ifconfig xxx up; kldunload if_xxx; done" and ending up with panicking
> at the cmpxchgq instruction.
>
> > I am hardpressed to see this as a driver problem, but
> > I'm willing to be proven wrong, does someone who
> > knows the stack code better than me have any insights
> > or ideas?
> >
> > It also appears system dependent, I have a couple
> > machines I've tried to reproduce in on and have been
> > unable. I also am told it happens on both amd64 and
> > i386, but it seems easier to reproduce on the former.
>
>    Dunno about amd64 since I only have i386 around; however, I'm sure
> the panic I observed is reproducible on my -CURRENT driver development box.
>
> > Lastly, from evidence so far I think this doesnt happen
> > on CURRENT, but the test group hasnt checked that
> > only I have and I dont have as much hardware :)
>
>    FWIW, I usually run into this panic after upgrading to a newer HEAD.
> Sometimes I can make the aforementioned ifconfig/kldunload script to
> survive longer by doing a clean rebuild on my driver.
>

I have learned what causes it, at least in our test group's setup...

They have an entry in /etc/rc.conf for the device like:
ifconfig_emX="addr netmask"

And then the script they run assigns emX a DIFFERENT
address, thats why you get into the multicast code and
then hit the panic.

I still would like to see the panic not happen, but to avoid
it just dont go assigning different addresses :)

Cheers,

Jack