svn commit: r333703 - head/sys/vm

Andrew Gallatin gallatin at cs.duke.edu
Thu May 17 18:16:02 UTC 2018


On 05/17/18 14:07, Mark Johnston wrote:
> On Thu, May 17, 2018 at 10:07:34AM -0700, Conrad Meyer wrote:
>> On Wed, May 16, 2018 at 9:27 PM, Mark Johnston <markj at freebsd.org> wrote:
>>> Author: markj
>>> Date: Thu May 17 04:27:08 2018
>>> New Revision: 333703
>>> URL: https://urldefense.proofpoint.com/v2/url?u=https-3A__svnweb.freebsd.org_changeset_base_333703&d=DwIBAg&c=imBPVzF25OnBgGmVOlcsiEgHoG1i6YHLR0Sj_gZ4adc&r=Ed-falealxPeqc22ehgAUCLh8zlZbibZLSMWJeZro4A&m=6lhtci2MYxtyrK5Ub70QC0DcEiQ77Ry2LTAb6cDtW5A&s=z0SOGvNGORjI-SySfy-aovuyFzy_K5CtCfbNeWbRGLA&e=
>>>
>>> Log:
>>>    Fix a race in vm_page_pagequeue_lockptr().
>>>
>>>    The value of m->queue must be cached after comparing it with PQ_NONE,
>>>    since it may be concurrently changing.
>>>
>>>    Reported by:  glebius
>>
>> What were the symptoms of this issue?  The test plan in the linked
>> phabricator revision says:
>>
>> "Gleb reported seeing panics as a result of the use of a bogus index
>> into the pagequeue array, and also reported that this patch fixed the
>> panics."
>>
>> So an attempt to lock pagequeues[PQ_NONE=255].pq_mutex, which is
>> either something later in the vm_domain object, or bogus memory?  One
>> of the mtx asserts trips?
> 
> I think it was "mtx_lock() of spin mutex"; I didn't get a lot of
> details.
> 
> I failed to note in the commit message that this race was introduced in
> r332974.
> 

The most common stack was:


panic: mtx_lock() of spin mutex (null) @ 
/data/ocafirmware.alt/FreeBSD/sys/vm/vm_page.c:3344
cpuid = 4
time = 1526415167
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfffffe158af62380
vpanic() at vpanic+0x1a3/frame 0xfffffe158af623e0
doadump() at doadump/frame 0xfffffe158af62460
__mtx_lock_flags() at __mtx_lock_flags+0x11a/frame 0xfffffe158af624a0
vm_page_dequeue() at vm_page_dequeue+0x8a/frame 0xfffffe158af624e0
vm_page_alloc_domain_after() at vm_page_alloc_domain_after+0x2cb/frame 
0xfffffe158af62560
vm_page_grab_pages() at vm_page_grab_pages+0x274/frame 0xfffffe158af62610
vn_sendfile() at vn_sendfile+0x83a/frame 0xfffffe158af628e0
[Tue May 15 20:12:48 2018]sys_sendfile() at sys_sendfile+0x119/frame 
0xfffffe158af62980
amd64_syscall() at amd64_syscall+0x298/frame 0xfffffe158af62ab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe158af62ab0



I once saw one like this:


Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x20:0xffffffff8088bf74
stack pointer           = 0x28:0xfffffe55af7712e0
frame pointer           = 0x28:0xfffffe55af771330
code segment            = base 0x0, limit 0xfffff, type 0x1b
                         = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 12 (irq446: mlx5_core0)
[Mon May 14 04:45:10 2018]trap number           = 9
panic: general protection fault
cpuid = 0
time = 1526273109
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfffffe55af770ff0
vpanic() at vpanic+0x1a3/frame 0xfffffe55af771050
panic() at panic+0x43/frame 0xfffffe55af7710b0
trap_fatal() at trap_fatal+0x35f/frame 0xfffffe55af771100
trap() at trap+0x6d/frame 0xfffffe55af771210
[Mon May 14 04:45:10 2018]calltrap() at calltrap+0x8/frame 
0xfffffe55af771210
--- trap 0x9, rip = 0xffffffff8088bf74, rsp = 0xfffffe55af7712e0, rbp = 
0xfffffe55af771330 ---
vm_pqbatch_submit_page() at vm_pqbatch_submit_page+0x144/frame 
0xfffffe55af771330
sendfile_free_page() at sendfile_free_page+0x10e/frame 0xfffffe55af771360
sendfile_free_mext_pg() at sendfile_free_mext_pg+0xb7/frame 
0xfffffe55af7713b0
mb_free_ext() at mb_free_ext+0x103/frame 0xfffffe55af7713e0
m_freem() at m_freem+0x48/frame 0xfffffe55af771400
tcp_do_segment() at tcp_do_segment+0x1647/frame 0xfffffe55af771500
tcp_input_with_port() at tcp_input_with_port+0xfcc/frame 0xfffffe55af771650
tcp_input() at tcp_input+0xb/frame 0xfffffe55af771660
[Mon May 14 04:45:10 2018]ip_input() at ip_input+0xe9/frame 
0xfffffe55af7716c0
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe55af771710
ether_demux() at ether_demux+0x140/frame 0xfffffe55af771740
ether_nh_input() at ether_nh_input+0x32c/frame 0xfffffe55af7717a0
netisr_dispatch_src() at netisr_dispatch_src+0xa8/frame 0xfffffe55af7717f0
ether_input() at ether_input+0x26/frame 0xfffffe55af771810
tcp_lro_flush_all() at tcp_lro_flush_all+0xf2/frame 0xfffffe55af771850
mlx5e_rx_cq_comp() at mlx5e_rx_cq_comp+0x5e5/frame 0xfffffe55af771950
mlx5_cq_completion() at mlx5_cq_completion+0x73/frame 0xfffffe55af771990
<...>

Thanks again for fixing it so quickly!

Drew


More information about the svn-src-all mailing list