[Bug 283903] rtw88: possible skb leak
- In reply to: bugzilla-noreply_a_freebsd.org: "[Bug 283903] rtw88: possible skb leak"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Sun, 02 Feb 2025 02:13:49 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=283903
--- Comment #25 from Guillaume Outters <guillaume-freebsd@outters.eu> ---
(In reply to Bjoern A. Zeeb from comment #16)
With your Dtrace I got a flagrant difference between the "before" and "after"
tipping point.
--------------------------------
-- The paths to allocation and freeing
There are 2 paths to linuxkpi_alloc_skb, let's call them A as Alloc:
- A1 lkpi_80211_txq_task (explicitely Tx)
- A2 lkpi_napi_task > rtw_pci_napi_poll (which would be the RX path?)
And 3 paths to linuxkpi_kfree_skb (D for Dealloc):
- D1 linux_work_fn > rtw_c2h_work > rtw_fw_c2h_cmd_handle >
rtw_tx_report_handle > linuxkpi_ieee80211_tx_status
Note that there is a "branch" at rtw_c2h_work, with:
D1.1 rtw_c2h_work+0x62 leading to the full path above, at the tail of
which linuxkpi_kfree_skb is called (by linuxkpi_ieee80211_tx_status)
D1.2 rtw_c2h_work+0x6a does a direct call to linuxkpi_kfree_skb
- D2 lkpi_napi_task > rtw_pci_napi_poll > linuxkpi_ieee80211_rx (explicitely
Rx)
- D3 [softclock_thread > softclock_call_cc] > rtw_tx_report_purge_timer
contrary to all of the above which start with a [taskqueue_thread_loop >
taskqueue_run_locked], this one's name suggests seems to be called on a
periodic
--------------------------------
-- My tests
I did a first test just after the reboot (I don't remember if it was mostly Rx
or Tx),
with interesting results (see (2) below for the full result):
- 283 allocs through A1 got freed by D1.1
- 1144 allocs through A2, of which:
- 861 freed by D2
- 283 freed by D1.2
Note how the 283 matches: there's an interesting mix of allocations handled by
the "other side"'s dealloc (and due to this mix I couldn't say for sure that 1
is Tx and 2 is Rx).
Now after some time running, and vmstat -m starting to show an increase in skb
mem consumption, I had totally different paths:
Be it in Tx or Rx (see the details in ):
- during the transfer I got 17870 A2, some of which got freed by a D2. But I
had some A1 too, with:
A2 = A1 + D2
This is **really surprising**, as I intended a balanced A1 + A2 = D2 (the sum
of allocations = the sum of deallocations)
The increase of vmstat was of exactly 2 * A1 * 4 KB
This explains very well: to achieve balance we should have freed as many as
we had allocated: (A1 + A2) - (D2) = 0, with an expected D2 = A1 + A2;
However here D2 = A2 - A1, thus our balance is of A1 + A2 - D2 = A1 + A2 -
(A2 - A1) = 2 * A1
- if waiting after the transfer, apart from some negligible new allocations on
this mode (A2 = A1 + D2),
we see a **new process** (D3, with a rtw_tx_report_purge_timer on
softclock_thread) running to free a bit of memory;
although it seems to be dedicated to garbage collecting, **it doesn't keep up
with the pace**
--------------------------------
-- My god!
It looks like SOME SKBUFFERS GET **ALLOCATED** INSTEAD OF **FREED**.
--------------------------------
Notes
--- (1) Test procedure
vms() { vmstat -m | egrep -e '(Use|lkpi|mbuf)' ; }
dt() { sudo dtrace -s rtw88-skb.d | egrep -v
'kernel.0xffff|fork_exit|taskqueue_thread_loop|softclock_thread' ; }
for p in "rx b:/tmp/1 /tmp/" "tx /tmp/1 b:/tmp/" # /tmp/1 contains the first
10000000 bytes of a gzip file.
do
set -- $p
{
dt $1.during &
vms
echo "# scp ($1) of a 10 MiB file"
time scp $2 $3 2>&1
sudo killall dtrace
vms
dt $1.after &
echo "# Sleep 20"
sleep 20
sudo killall dtrace
vms
} | tee rtw88-skb.results.pourri.$1
done
--- (2) Before
linuxkpi_alloc_skb
kernel`linuxkpi_dev_alloc_skb+0xd
kernel`lkpi_80211_txq_task+0x1ec
kernel`taskqueue_run_locked+0x182
kernel`taskqueue_thread_loop+0xc2
283
linuxkpi_kfree_skb
kernel`linuxkpi_ieee80211_tx_status_ext+0x163
kernel`linuxkpi_ieee80211_tx_status+0x45
if_rtw88.ko`rtw_tx_report_handle+0x136
if_rtw88.ko`rtw_fw_c2h_cmd_handle+0x15a
if_rtw88.ko`rtw_c2h_work+0x62
kernel`linux_work_fn+0xe4
kernel`taskqueue_run_locked+0x182
kernel`taskqueue_thread_loop+0xc2
283
linuxkpi_alloc_skb
kernel`linuxkpi_dev_alloc_skb+0xd
if_rtw88.ko`rtw_pci_napi_poll+0x254
kernel`lkpi_napi_task+0xf
kernel`taskqueue_run_locked+0x182
kernel`taskqueue_thread_loop+0xc2
1144
linuxkpi_kfree_skb
if_rtw88.ko`rtw_c2h_work+0x6a
kernel`linux_work_fn+0xe4
kernel`taskqueue_run_locked+0x182
kernel`taskqueue_thread_loop+0xc2
283
linuxkpi_kfree_skb
kernel`linuxkpi_ieee80211_rx+0x5a3
if_rtw88.ko`rtw_pci_napi_poll+0x31b
kernel`lkpi_napi_task+0xf
kernel`taskqueue_run_locked+0x182
kernel`taskqueue_thread_loop+0xc2
861
--
You are receiving this mail because:
You are on the CC list for the bug.