panic in nfs on arm

Sun Oct 26 12:12:09 UTC 2014

On Sun, 26 Oct 2014 13:00:29 +0100, Rick Macklem <rmacklem at uoguelph.ca>  
wrote:

> Kostik wrote:
>> On Sat, Oct 25, 2014 at 07:21:13PM -0400, Rick Macklem wrote:
>> > Ronald Klop wrote:
>> > > Hi,
>> > >
>> > > I got a panic on my arm computer while building a port with
>> > > /usr/ports
>> > > mounted from my FreeBSD-10-STABLE/amd64 machine.
>> > >
>> > > This is the machine which paniced:
>> > > FreeBSD 11.0-CURRENT #1 r272028M: Tue Sep 23 17:11:45 CEST 2014
>> > > root at sjakie.klop.ws:/usr/obj-arm/arm.arm/usr/src-arm/sys/SHEEVAPLUG
>> > >  arm
>> > >
>> > >
>> > > Tracing pid 90295 tid 100119 td 0xc5f8c960
>> > > db_trace_self() at db_trace_self
>> > >           pc = 0xc0bb12c8  lr = 0xc0bb1354 (db_trace_thread+0x50)
>> > >           sp = 0xdf29e5d0  fp = 0xc3e07120
>> > > db_trace_thread() at db_trace_thread+0x50
>> > >           pc = 0xc0bb1354  lr = 0xc0936314
>> > >           (db_command_init+0x5a4)
>> > >           sp = 0xdf29e630  fp = 0xc3e07120
>> > > db_command_init() at db_command_init+0x5a4
>> > >           pc = 0xc0936314  lr = 0xc0935ad0 (db_skip_to_eol+0x484)
>> > >           sp = 0xdf29e648  fp = 0xc3e07120
>> > >           r4 = 0xc0c8d350  r5 = 0x00000000
>> > > db_skip_to_eol() at db_skip_to_eol+0x484
>> > >           pc = 0xc0935ad0  lr = 0xc0935c38 (db_command_loop+0x5c)
>> > >           sp = 0xdf29e6e8  fp = 0xc3e07120
>> > >           r4 = 0xdf29e6fc  r5 = 0xc0c8d64c
>> > >           r6 = 0x3cd90e75  r7 = 0x00000000
>> > >           r8 = 0x00000001 r10 = 0x600000d3
>> > > db_command_loop() at db_command_loop+0x5c
>> > >           pc = 0xc0935c38  lr = 0xc0937f80
>> > >           (X_db_sym_numargs+0xec)
>> > >           sp = 0xdf29e6f0  fp = 0xc3e07120
>> > > X_db_sym_numargs() at X_db_sym_numargs+0xec
>> > >           pc = 0xc0937f80  lr = 0xc0a6f0c0 (kdb_trap+0x94)
>> > >           sp = 0xdf29e808  fp = 0xc3e07120
>> > >           r4 = 0xdf29e8f8
>> > > kdb_trap() at kdb_trap+0x94
>> > >           pc = 0xc0a6f0c0  lr = 0xc0bc1d60 (badaddr_read+0x274)
>> > >           sp = 0xdf29e828  fp = 0xc3e07120
>> > >           r4 = 0xdf29e8f8  r5 = 0x00000001
>> > >           r6 = 0x3cd90e75  r7 = 0xc5f8c960
>> > >           r8 = 0xdf29e8f8 r10 = 0xdf2a1eb0
>> > > badaddr_read() at badaddr_read+0x274
>> > >           pc = 0xc0bc1d60  lr = 0xc0bc1e98 (badaddr_read+0x3ac)
>> > >           sp = 0xdf29e840  fp = 0xc3e07120
>> > >           r4 = 0xc5f8c960  r5 = 0xdf29e8f8
>> > >           r6 = 0x3cd90e05
>> > > badaddr_read() at badaddr_read+0x3ac
>> > >           pc = 0xc0bc1e98  lr = 0xc0bc2278
>> > >           (data_abort_handler+0x10c)
>> > >           sp = 0xdf29e858  fp = 0xc3e07120
>> > >           r4 = 0xc0cd8af8  r5 = 0xffff1004
>> > > data_abort_handler() at data_abort_handler+0x10c
>> > >           pc = 0xc0bc2278  lr = 0xc0bb2f40 (exception_exit)
>> > >           sp = 0xdf29e8f8  fp = 0xc3e07120
>> > >           r4 = 0xffffffff  r5 = 0xffff1004
>> > >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
>> > >           r8 = 0x0000000f  r9 = 0x00000101
>> > >          r10 = 0x0000001d
>> > > exception_exit() at exception_exit
>> > >           pc = 0xc0bb2f40  lr = 0xc0b8daf8 (uma_reclaim+0x1f8)
>> > >           sp = 0xdf29e948  fp = 0xc3e07120
>> > >           r0 = 0xba9b9127  r1 = 0x8b3de5fb
>> > >           r2 = 0xc61c1fc8  r3 = 0xba9b9126
>> > >           r4 = 0x00000000  r5 = 0xc61c1fc8
>> > >           r6 = 0x3cd90e05  r7 = 0xc0e0ea48
>> > >           r8 = 0x0000000f  r9 = 0x00000101
>> > >          r10 = 0x0000001d r12 = 0x00000000
>> > > uma_reclaim() at uma_reclaim+0x24c
>> > This looks to me like a crash in uma_reclaim() and I find UMA
>> > way too obscure to understand.
>> >
>> > I have no idea if it might be related, but alc@ put a fix for low
>> > memory situations in r272071 (or maybe it's r272221?).
>> >
>> > Might be worth trying a slightly newer kernel to see if the
>> > problem still occurs.
>> >
>> > And hopefully someone more conversant with UMA (or this stack
>> > trace) can help more.
>> >
>> > rick
>> >
>> > >           pc = 0xc0b8db4c  lr = 0xc0b8c800 (uma_zalloc_arg+0x2f0)
>> > >           sp = 0xdf29e978  fp = 0xdf29ec10
>> > >           r4 = 0xc3e071d8  r5 = 0xc0e0ea00
>> > >           r6 = 0xc3e07120  r7 = 0x00000000
>> > >           r8 = 0x00000102  r9 = 0xdf29ecf8
>> > >          r10 = 0xc61c0760
>> > > uma_zalloc_arg() at uma_zalloc_arg+0x2f0
>> uma_reclaim() is not called from uma_zalloc().
>> I think there is some issue with ddb on arm, which means that
>> the backtrace is not useful.  See below for one more.
>>
> Yea, I noticed that and the one below (ie. I knew the stack dump
> wasn't correct). I kinda hoped it was right w.r.t. the crash
> happening in uma_reclaim() { which only seems to be called from
> the pageout daemon? }, so that doesn't match up with the thread.
>
> Also, I couldn't see what the panic message actually was. Is it
> this one at the bottom:
> Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock
> or was that what happened when you tried to crash dump?
>
> Btw, nfscl_nget() does call uma_zalloc(M_WAITOK), but it doesn't hold a  
> mutex
> when it does this.
>
> rick

Hi,

The non-sleepable lock is not the original panic. That non-sleepable lock  
happened when I dumped the memory to dumpdev from the debugger. I don't  
have the original panic message. It was not on the serial output anymore.  
Is it possible to let the debugger print it again?

I rebooted the machine already. Let's see if it happens again someday.

Ronald.

>> > >           pc = 0xc0b8c800  lr = 0xc09e1df0 (nfscl_nget+0x308)
>> > >           sp = 0xdf29e990  fp = 0xdf29ec10
>> > >           r4 = 0x9bb9fa43  r5 = 0x00000000
>> > >           r6 = 0xc550dce8  r7 = 0xc3edaa00
>> > >           r8 = 0xc3ebbac0
>> > > nfscl_nget() at nfscl_nget+0x308
>> > >           pc = 0xc09e1df0  lr = 0xc09da69c
>> > >           (ncl_readlinkrpc+0xf60)
>> > >           sp = 0xdf29e9d8  fp = 0xdf29ea10
>> > >           r4 = 0xc550dce8  r5 = 0x00000000
>> > >           r6 = 0xc550dcf8  r7 = 0xdf29ecf8
>> > >           r8 = 0xdf29ec6c  r9 = 0x00000000
>> > >          r10 = 0xdf29ed28
>> > > ncl_readlinkrpc() at ncl_readlinkrpc+0xf60
>> > >           pc = 0xc09da69c  lr = 0xc0bdae44 (VOP_MKDIR_APV+0x94)
>> > >           sp = 0xdf29ec40  fp = 0xbffff620
>> > >           r4 = 0xc0c95c68  r5 = 0xdf29ec6c
>> > >           r6 = 0x00000001  r7 = 0x00020284
>> > >           r8 = 0xffffff9c  r9 = 0x00200800
>> > >          r10 = 0xc5f8c960
>> > > VOP_MKDIR_APV() at VOP_MKDIR_APV+0x94
>> I do not see how VOP_MKDIR() may end up calling ncl_readlinkrpc(),
>> esp. without intervening frame.
>>
>> > >           pc = 0xc0bdae44  lr = 0xc0aca614 (kern_mkdirat+0x18c)
>> > >           sp = 0xdf29ec50  fp = 0xbffff620
>> > >           r4 = 0xdf29ed28  r5 = 0xdf29ec90
>> > >           r6 = 0x00000000
>> > > kern_mkdirat() at kern_mkdirat+0x18c
>> > >           pc = 0xc0aca614  lr = 0xc0aca684 (kern_mkdir+0x24)
>> > >           sp = 0xdf29ede0  fp = 0xbffff620
>> > >           r4 = 0x00020290  r5 = 0xc5f8c960
>> > >           r6 = 0x00000000  r7 = 0xc5f7f000
>> > >           r8 = 0x00000000 r10 = 0x00013640
>> > > kern_mkdir() at kern_mkdir+0x24
>> > >           pc = 0xc0aca684  lr = 0xc0aca6a8 (sys_mkdir+0x1c)
>> > >           sp = 0xdf29edf0  fp = 0xbffff620
>> > > sys_mkdir() at sys_mkdir+0x1c
>> > >           pc = 0xc0aca6a8  lr = 0xc0bc2884 (swi_handler+0x254)
>> > >           sp = 0xdf29edf8  fp = 0xbffff620
>> > > swi_handler() at swi_handler+0x254
>> > >           pc = 0xc0bc2884  lr = 0xc0bb2ed0 (swi_exit)
>> > >           sp = 0xdf29ee60  fp = 0xbffff620
>> > >           r4 = 0x00020290  r5 = 0x2085e8e0
>> > >           r6 = 0x00020284  r7 = 0x00000088
>> > >           r8 = 0x00000001
>> > > swi_exit() at swi_exit
>> > >           pc = 0xc0bb2ed0  lr = 0xc0bb2ed0 (swi_exit)
>> > >           sp = 0xdf29ee60  fp = 0xbffff620
>> > > Unable to unwind further
>> > >
>> > >
>> > > Unfortunately dumping the kernel core also paniced.
>> > > db> dump
>> > > Physical memory: 507 MB
>> > > Dumping 74 MB: 71 67 63
>> > > vm_fault(0xc4147000, 0, 1, 0) -> 0
>> > > Fatal kernel mode data abort: 'Translation Fault (P)'
>> > > trapframe: 0xdf29e0b8
>> > > FSR=00000017, FAR=00000014, spsr=a00000d3
>> > > r0 =c0cd0f40, r1 =00000000, r2 =c5f8c960, r3 =00000004
>> > > r4 =00000000, r5 =00000000, r6 =00000000, r7 =c3ead01c
>> > > r8 =c3ead000, r9 =c3e9e88c, r10=00000000, r11=0000000a
>> > > r12=600000d3, ssp=df29e108, slr=c0bb4e24, pc =c0a7d060
>> > >
>> > > panic: Fatal abort
>> > > Uptime: 3d18h30m32s
>> > > Sleeping thread (tid 100119, pid 90295) owns a non-sleepable lock
>> > > _______________________________________________
>> > > freebsd-fs at freebsd.org mailing list
>> > > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > > To unsubscribe, send any mail to
>> > > "freebsd-fs-unsubscribe at freebsd.org"
>> > >
>> > _______________________________________________
>> > freebsd-fs at freebsd.org mailing list
>> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> > To unsubscribe, send any mail to
>> > "freebsd-fs-unsubscribe at freebsd.org"