loop inside uma_zfree critical section

Monthadar Al Jaberi monthadar at gmail.com
Fri Dec 16 09:45:59 UTC 2011


On Wed, Dec 14, 2011 at 9:10 PM, Arnaud Lacombe <lacombar at gmail.com> wrote:
> Hi,
>
> On Wed, Dec 14, 2011 at 2:47 PM, Monthadar Al Jaberi
> <monthadar at gmail.com> wrote:
>> On Tue, Dec 13, 2011 at 4:50 PM, Monthadar Al Jaberi
>> <monthadar at gmail.com> wrote:
>>> On Tue, Dec 13, 2011 at 3:35 PM, John Baldwin <jhb at freebsd.org> wrote:
>>>> On Tuesday, December 13, 2011 7:46:34 am Monthadar Al Jaberi wrote:
>>>>> Hi,
>>>>>
>>>>> I am not sure why I am having this problem, but looking
>>>>> at the code I dont understand uma_core.c really good.
>>>>> So I hope someone can shed a light on this:
>>>>>
>>>>> I am running on an arm board and and running a kernel module
>>>>> that behaves like a wlan interface. so I tx and rx packets.
>>>>>
>>>>> For now tx is only only sending beacon like frames.
>>>>> This is done through using ieee80211_beacon_alloc().
>>>>>
>>>>> Then in a callout task to generate periodic beacons:
>>>>>
>>>>>     m_dup(avp->beacon, M_DONTWAIT);
>>>>>     mtx_lock(...);
>>>>>     STAILQ_INSERT_TAIL(...);
>>>>>     taskqueue_enqueue(...);
>>>>>     mtx_unlock(...);
>>>>>     callout_schedule(...);
>>>>>
>>>>> On the RX side, the interrupt handler will read out buffer
>>>>> then place it on a queue to be handled by wlan-glue code.
>>>>> For now wlan-glue code just frees the mbuf it instead of
>>>>> calling net80211 ieee80211_input() functions:
>>>>>
>>>>>     m_copyback(...);
>>>>>     /* Allocate new mbuf for next RX. */
>>>>>     MGETHDR(..., M_DONTWAIT, MT_DATA);
>>>>>     bzero((mtod(sc->Rx_m, void *)), MHLEN);
>>>>>     sc->Rx_m->m_len = 0; /* NB: m_gethdr does not set */
>>>>>     sc->Rx_m->m_data += 20; /* make headroom */
>>>>>
>>>>> Then I use a lockmgr inside my kernel module that should
>>>>> make sure that we either are on TX or RX path.
>>>>
>>>> Uh, you can't use a lockmgr lock in interrupt handlers or in
>>>> if_transmit/if_start routines.  You should most likely just be using a plain
>>>> mutex instead.  Also, new code shouldn't use lockmgr in general.  If you
>>>> need a sleepable lock, use sx instead.  It has a more straightforward API.
>>>
>>> Ok, I will change the interrupt handler to do something like this:
>>>
>>>    disaple_interrupt();
>>>    taskqueue_enqueue(...); /* on new rx task queue */
>>>
>>> Then on the new rx proc:
>>>
>>>    sx_slock(...);
>>>    m_copyback(...);
>>>    enable_interrupt();
>>>    /* Allocate new mbuf for next RX. */
>>>    MGETHDR(..., M_DONTWAIT, MT_DATA);
>>>    bzero((mtod(sc->Rx_m, void *)), MHLEN);
>>>    sc->Rx_m->m_len = 0; /* NB: m_gethdr does not set */
>>>    sc->Rx_m->m_data += 20; /* make headroom */
>>>    sx_sunlock(...);
>>>
>>> I lock TX/RX paths to make sure my code is threading safe.
>>> Also because while programming my deivce (SPI communicatioin)
>>> there will be a tsleep() waiting for the DMA interrupt and
>>> thus we could be prempted by e.g. a beacon_callout etc...
>>>
>>
>> I did implement your suggestions, using sx and modified interrupt handler
>> as specified above. But still same problem as before.
>>
>>>>
>>>>> The problem seems to be at [2], somehow after swapping
>>>>> buckets in uma_zfree m_dup returns a pointer to
>>>>> an mbuf that is still being used by us, [1] and [3]
>>>>> have same address.
>>>>> Then we call m_freem twice on same mbuf, [4] and [5].
>>>>> And a loop occurs inside uma_free.
>>>>> I am using mbufs in a wrong way? Shouldnt mbufs be thread safe?
>>>>> Problem seems to occur while swapping buckets.
>>>>
>>>> Hmm, the uma uses its own locking, so it should be safe, yes.  However, you
>>>> are correct about [1] and [3].  The thing is, after [1] the mbuf shouldn't
>>>> be in any buckets at all (it only gets put back into the bucket during a
>>>> free).  Are you sure the mbuf wasn't double free'd previously?
>>
>> I rechecked and it is almost certain that I dont double free the mbuf
>> before [1].
>> And its not like it crashed in the beginning, it does run for a while
>> and then it crashes. So our code works for like a hundred beacons sent/received
>> between two arm boards. Its feels like something is preempted, which explains
>> why the mbuf is still in the bucket (wrongly)?
>>
> are you running an INVARIANTS/DIAGNOSTICS/WITNESS/LOCK_DEBUG/...
> enabled kernel ?
>
> are you running on an SMP platform where there might be cache-coherency issue ?

Sorry for late answer, I added DIAGNOSTIC/WITNESS

didnt see anything strange except for a couple of LORs...

If I add INVARIANTS I couldnt login in at all.... it comes to login
promt but I cant type anything...

I am running one arm cpu, so its not an SMP platform.

This is a snippet of my kernel config:

makeoptions	DEBUG=-g		#Build kernel with gdb(1) debug symbols
options 	DDB
options 	KDB

options 	SCHED_4BSD		#4BSD scheduler
options 	INET			#InterNETworking
options 	INET6			#IPv6 communications protocols
options 	FFS			#Berkeley Fast Filesystem
options	DIAGNOSTIC
options 	UFS_ACL			#Support for access control lists
options 	UFS_DIRHASH		#Improve performance on big directories
options 	KTRACE			#ktrace(1) support
options 	SYSVSHM			#SYSV-style shared memory
options 	SYSVMSG			#SYSV-style message queues
options 	SYSVSEM			#SYSV-style semaphores
options 	_KPOSIX_PRIORITY_SCHEDULING #Posix P1003_1B real-time extensions
options 	MUTEX_NOINLINE
options 	RWLOCK_NOINLINE
options 	SX_NOINLINE
options 	NO_FFS_SNAPSHOT
options 	NO_SWAPPING
options	DEADLKRES
options	INVARIANTS
options	INVARIANT_SUPPORT
options	WITNESS

>
> Thanks,
>  - Arnaud

Thanks,



-- 
Monthadar Al Jaberi


More information about the freebsd-hackers mailing list