svn commit: r272211 - head/sys/net

Alan Somers asomers at freebsd.org
Wed Oct 19 23:27:58 UTC 2016


On Sat, Sep 27, 2014 at 7:57 AM, Alexander V. Chernikov
<melifaro at freebsd.org> wrote:
> Author: melifaro
> Date: Sat Sep 27 13:57:48 2014
> New Revision: 272211
> URL: http://svnweb.freebsd.org/changeset/base/272211
>
> Log:
>   Use underlying ports counters to get lagg statistics instead of
>   per-packet accounting.
>   This introduce user-visible changes like aggregating error counters.
>
>   Reviewed by:  asomers (prev.version), glebius
>   CR:           D781
>   MFC after:    2 weeks
>   Sponsored by: Yandex LLC
>
> Modified:
>   head/sys/net/if_lagg.c
>   head/sys/net/if_lagg.h
>   head/sys/net/if_var.h
>

I think this change is causing a LOR and deadlock.  It happens if I
create a lagg and then quickly destroy it.  The deadlocked threads
have these stack traces:


Tracing command ifconfig pid 7334 tid 100823 td 0xfffff8014ff34000
sched_switch() at sched_switch+0x48a/frame 0xfffffe20b3771470
mi_switch() at mi_switch+0x167/frame 0xfffffe20b37714a0
turnstile_wait() at turnstile_wait+0x3be/frame 0xfffffe20b37714f0
__mtx_lock_sleep() at __mtx_lock_sleep+0x196/frame 0xfffffe20b3771570
__mtx_lock_flags() at __mtx_lock_flags+0x10d/frame 0xfffffe20b37715c0
_rm_rlock() at _rm_rlock+0x28b/frame 0xfffffe20b3771600
_rm_rlock_debug() at _rm_rlock_debug+0x11f/frame 0xfffffe20b3771640
lagg_get_counter() at lagg_get_counter+0x4c/frame 0xfffffe20b37716c0
if_data_copy() at if_data_copy+0xa1/frame 0xfffffe20b37716e0
sysctl_rtsock() at sysctl_rtsock+0x56c/frame 0xfffffe20b3771860
sysctl_root_handler_locked() at sysctl_root_handler_locked+0x8a/frame
0xfffffe20b37718a0
sysctl_root() at sysctl_root+0x188/frame 0xfffffe20b3771920
userland_sysctl() at userland_sysctl+0x16e/frame 0xfffffe20b37719c0
sys___sysctl() at sys___sysctl+0x74/frame 0xfffffe20b3771a70
amd64_syscall() at amd64_syscall+0x314/frame 0xfffffe20b3771bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe20b3771bf0
--- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x800fceeea, rsp
= 0x7fffffffe408, rbp = 0x7fffffffe440 ---

Tracing command ifconfig pid 7331 tid 100796 td 0xfffff80066df5a00
sched_switch() at sched_switch+0x48a/frame 0xfffffe20b36ea630
mi_switch() at mi_switch+0x167/frame 0xfffffe20b36ea660
turnstile_wait() at turnstile_wait+0x3be/frame 0xfffffe20b36ea6b0
__rw_wlock_hard() at __rw_wlock_hard+0xb5/frame 0xfffffe20b36ea740
_rw_wlock_cookie() at _rw_wlock_cookie+0xbc/frame 0xfffffe20b36ea780
lagg_ether_cmdmulti() at lagg_ether_cmdmulti+0x5c/frame 0xfffffe20b36ea7c0
lagg_ioctl() at lagg_ioctl+0x115a/frame 0xfffffe20b36ea8a0
ifioctl() at ifioctl+0xdc1/frame 0xfffffe20b36ea930
kern_ioctl() at kern_ioctl+0x246/frame 0xfffffe20b36ea990
sys_ioctl() at sys_ioctl+0x171/frame 0xfffffe20b36eaa70
amd64_syscall() at amd64_syscall+0x314/frame 0xfffffe20b36eabf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe20b36eabf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800fd417a, rsp =
0x7fffffffe228, rbp = 0x7fffffffe2a0 ---

The problem is that lagg_get_counter calls LAGG_RLOCK after calling
IF_ADDR_RLOCK at rtsock.c:1717.  Meanwhile, another thread called
IF_ADDR_WLOCK at if_lagg.c:1581 after having already called LAGG_WLOCK
at f_lagg.c:1530.  I think this revision introduced the problem
because reading the lagg's counters did not previously require the
LAGG_RLOCK.  Do you have any ideas on how to fix it?

-Alan


More information about the svn-src-all mailing list