AVILA getting close!
Andrew Turner
andrew at fubar.geek.nz
Sat Jun 21 16:01:38 UTC 2014
On Sat, 21 Jun 2014 00:08:28 -0700
John-Mark Gurney <jmg at funkthat.com> wrote:
> John-Mark Gurney wrote this message on Fri, Jun 20, 2014 at 18:08
> -0700:
> > Andrew Turner wrote this message on Fri, Jun 20, 2014 at 20:08
> > +0100:
> > > On Fri, 20 Jun 2014 08:10:24 -0700
> > > John-Mark Gurney <jmg at funkthat.com> wrote:
> > >
> > > > John-Mark Gurney wrote this message on Wed, Jun 18, 2014 at
> > > > 15:58 -0700:
> > > > > So, w/ the recent couple of patches that alc has provided, I
> > > > > no longer receive kernel panics on my AVILA board!
> > > > >
> > > > > $ uname -a
> > > > > FreeBSD avila.funkthat.com 11.0-CURRENT FreeBSD 11.0-CURRENT
> > > > > #27 r267333:267349M: Wed Jun 11 09:57:58 PDT 2014
> > > > > jmg at carbon.funkthat.com:/usr/obj/arm.armeb/usr/src.avila/sys/AVILA
> > > > > arm $ uptime 12:15AM up 1 day, 15 mins, 2 users, load
> > > > > averages: 0.13, 0.11, 0.08
> > > > >
> > > > > This survived a portsnap extract... This is all over NFS...
> > > > >
> > > > > Though the issue that I'm now having is that some binaries
> > > > > (newsyslog) and sometimes other binaries (awk, grep) core
> > > > > dump...
> > > > >
> > > > > I believe this is an issue w/ rtld, or related... If I
> > > > > compile newsyslog -static, it works fine... Otherwise I get
> > > > > a SIGILL, and that is because it jumps off into the weeds..
> > > > > Though gdb on arm isn't very useful..
> > > >
> > > > ok, so the SIGILL only occures under gdb, and this is because
> > > > single stepping into a RAS sequence doesn't work very well...
> > > > If you set a break point on the return (after the RAS
> > > > sequence), you can get past this...
> > > >
> > > > I got to the point in rtld.c code:
> > > > if (obj->pltrel)
> > > > rel = (const Elf_Rel *) ((caddr_t) obj->pltrel +
> > > > reloff); else
> > > > rel = (const Elf_Rel *) ((caddr_t) obj->pltrela +
> > > > reloffand was seeing gdb try to execute the pltrela line, but:
> > > > i;
> > > >
> > > > and was seeing gdb try to execute the pltrela line, but:
> > > > (gdb) print * (const Elf_Rel *) ((caddr_t) obj->pltrela +
> > > > reloff) Error accessing memory address 0x118: Bad address.
> > > > (gdb) print/x obj->pltrela
> > > > $4 = 0x0
> > > > (gdb) print /x reloff
> > > > $5 = 0x118
> > > > (gdb) print obj->pltrel
> > > > $6 = (const Elf_Rel *) 0x94e8
> > > Based on my copy of newsyslog I built for armeb this looks
> > > correct. To verify it could you dump the .dynamic section from
> > > the binary? Something like 'objdump -s newsyslog' will get it.
> >
> > ok, available at:
> > https://www.funkthat.com/~jmg/20140619/objdump.newsyslog
> >
> > > > Hun? obj->pltrel is non-zero, so it should have executed the
> > > > other line...
> > > >
> > > > I recompiled rtld w/ -O0, and sure enough, newsyslog runs
> > > > fine... If I compile w/o -O, or w/ -O1, it fails...
> > > >
> > > > Comments or suggestions?
> > >
> > > What is the value of rel after the if statement? In the -O/-O1
> > > case the asm looks like:
> > >
> > > ldr r2, [sp, #20] ; Load obj to r2
> > > ldr r3, [r2, #124] ; Load obj->pltrel to r3
> > > cmp r3, #0 ; 0x0 ; if obj->pltrel:
> > > ldrne r2, [sp, #16] ; != NULL: Load reloff to r2
> > > addne r4, r3, r2 ; != NULL: Add obj->pltrel + reloff to r4
> > > ldreq r2, [sp, #20] ; == NULL: Load obj to r2
> > > ldreq r3, [r2, #132] ; == NULL: Load obj->pltrela to r3
> > > ldreq r2, [sp, #16] ; == NULL: Load reloff to r2
> > > addeq r4, r2, r3 ; == NULL: Add obj->pltrela + reloff to r4
> > >
> > > Given this I could see how gdb gets confused.
> > >
> > > It may also pay to get the registers from gdb at this point.
> >
> > Arg! This is frustrating, I'm getting such different behaviors from
> > time to time.. now it isn't having that fault.. but it's getting
> > farther, but... I this is because our in tree gdb is messed up..
> >
> > But, I am getting farther... now the last break at rtld.c:3651
> > looks like it's returning a bogus pointer:
> > (gdb) print *req
> > $12 = {name = 0x9190 "__aeabi_read_tp", hash = 0xf008a80,
> > hash_gnu = 0x52432dd3, ventry = 0x2003b1d0, flags = 0x1,
> > defobj_out = 0x2003c400, sym_out = 0x20062454, lockstate =
> > 0xbfffeda0}
> >
> > defobj_out looks bogus to me... We don't have any object mapped
> > there:
> > (gdb) info shared
> > >From To Syms Read Shared Object Library
> > 0x200427c8 0x20048814 Yes /lib/libgcc_s.so.1
> > 0x2007a4e8 0x2017f320 Yes /lib/libc.so.7
> > 0x20018f14 0x2002c99c Yes /libexec/ld-elf.so.1
> >
> > the data at 0x2003c400 doesn't look like code:
> > (gdb) x/32x 0x2003c400
> > 0x2003c400: 0xd550b87a 0x00000001 0x00000000
> > 0x2003a080 0x2003c410: 0x00000000 0x00000001
> > 0x00000000 0x20051000 0x2003c420: 0x0016a000
> > 0x00143000 0x00000000 0x20051000 0x2003c430:
> > 0x2019dfd0 0x2007a4e8 0x20051034 0x000000a0
> > 0x2003c440: 0x00000000 0x00000007 0x00000002
> > 0x2019bcf0 0x2003c450: 0x00000004 0x00000058
> > 0x00000000 0x00000008 0x2003c460: 0x20051000
> > 0x00000000 0x2019e0b8 0x200713e0 0x2003c470:
> > 0x000040c0 0x00000000 0x00000000 0x200754a0
> >
> > and then as I stepi out of symlook_global:
> > (gdb) x/6i $pc
> > 0x2001f0b4 <symlook_global+348>: cmp r0, #0 ; 0x0
> > 0x2001f0b8 <symlook_global+352>: moveq r0, #3 ; 0x3
> > 0x2001f0bc <symlook_global+356>: movne r0, #0 ; 0x0
> > 0x2001f0c0 <symlook_global+360>: add sp, sp, #36 ;
> > 0x24 0x2001f0c4 <symlook_global+364>: pop {r4, r5, r6,
> > r7, lr} 0x2001f0c8 <symlook_global+368>: bx lr
> > (gdb) info registers
> > r0 0x20062454 0x20062454
> > r1 0x933b 0x933b
> > r2 0x0 0x0
> > r3 0xa4 0xa4
> > r4 0x0 0x0
> > r5 0xbfffed3c 0xbfffed3c
> > r6 0xbfffed08 0xbfffed08
> > r7 0x20037af4 0x20037af4
> > r8 0x0 0x0
> > r9 0x1 0x1
> > r10 0x8a2c 0x8a2c
> > r11 0xbfffed30 0xbfffed30
> > r12 0x23de 0x23de
> > sp 0xbfffec94 0xbfffec94
The stack is bogus here, the stack should be 8-byte aligned in non-leaf
functions.
> > lr 0x2001efb0 0x2001efb0
> > pc 0x2001f0b4 0x2001f0b4
> > fps 0x0 0x0
> > cpsr 0x60000010 0x60000010
> >
> > Then stepi till 0x2001f0c8:
> > (gdb) info registers
> > r0 0x0 0x0
> > r1 0x933b 0x933b
> > r2 0x0 0x0
> > r3 0xa4 0xa4
> > r4 0x2003c000 0x2003c000
> > r5 0xbfffed3c 0xbfffed3c
> > r6 0x20037af4 0x20037af4
> > r7 0xbfffece8 0xbfffece8
> > r8 0x0 0x0
> > r9 0x1 0x1
> > r10 0x8a2c 0x8a2c
> > r11 0xbfffed30 0xbfffed30
> > r12 0x23de 0x23de
> > sp 0xbfffeccc 0xbfffeccc
It's still bogus suggesting someone in the call chain failed at
aligning it correctly.
> > lr 0x2003c000 0x2003c000
> > pc 0x2001f0c8 0x2001f0c8
> > fps 0x0 0x0
> > cpsr 0x20000010 0x20000010
> >
> > and now the lr is bogus... it transfers control to 0x2003c000 which
> > is before the fault at 0x2003c0f4... And again, this looks like
> > data, not code:
When the stack is incorrectly aligned all bets are off, I've seen
strange errors when it's not 8-byte aligned. The compiler relies on this
assumption to load/store stack based data.
> > (gdb) x/64x 0x2003c000
> > 0x2003c000: 0xd550b87a 0x00000001 0x2003c200
> > 0xbfffffb8 0x2003c010: 0x00000000 0x00000001
> > 0x00000000 0x00008000 0x2003c020: 0x00012000
> > 0x00009000 0x00008000 0x00000000 0x2003c030:
> > 0x00018724 0x00009ca8 0x00008034 0x000000e0
> > 0x2003c040: 0x00008114 0x00000007 0x00000000
> > 0x00000000 0x2003c050: 0x00000000 0x00000000
> > 0x00000000 0x00000000 0x2003c060: 0x00000000
> > 0x00000000 0x0001881c 0x000094a0 0x2003c070:
> > 0x00000048 0x00000000 0x00000000 0x000094e8
> > 0x2003c080: 0x00000308 0x00000000 0x00000000
> > 0x0000888c 0x2003c090: 0x00008f9c 0x000003b1
> > 0x00009430 0x00000002 0x2003c0a0: 0x00000000
> > 0x00000000 0x0000934e 0x00008180 0x2003c0b0:
> > 0x00000061 0x00008304 0x00000071 0x00000061
> > 0x2003c0c0: 0x00000005 0x0000001f 0x0000000a
> > 0x00000071 0x2003c0d0: 0x000084d8 0x00008558
> > 0x000086c8 0x00000000 0x2003c0e0: 0x00000000
> > 0x2003d000 0x00000000 0x00000000 0x2003c0f0:
> > 0x00000000 0x2003c0f0 0x2003b180 0x00000007
> >
> > If I continue to stepi from here, it will fault at f4...
> >
> > This looks like a stack smash issue as the lr we pop off the stack
> > is incorrect..
>
> So, forgot I could set a watchpoint on the stack address, so I did:
> (gdb) c
> Continuing.
> Watchpoint 9: *0xbfffecc8
>
> Old value = 0x2001f584
> New value = 0x2003c000
> 0x20019960 in donelist_check (dlp=0xbfffed08, obj=0x2003c000)
> at /usr/src.avila/libexec/rtld-elf/rtld.c:1380
> 1380 dlp->objs[dlp->num_used++] = obj;
>
> I tracked dlp, and it's the donelist pointer, and it is allocated
> from symlook_default via donelist_init at the top of the function:
> #define donelist_init(dlp) \
> ((dlp)->objs = alloca(obj_count * sizeof (dlp)->objs[0]), \
> assert((dlp)->objs != NULL), \
> (dlp)->num_alloc = obj_count, \
> (dlp)->num_used = 0)
>
> Hmm... since obj_count is a global, and we are in a possibly
> threaded environment, we should probably move assigning num_alloc
> before the alloca, and then use num_alloc instead of obj_count just
> to be safe... but taht won't be the issue here, as num_used is 0
> in my case...
>
> I've looked at the assembly for donelist_init for both the working
> and non-working case, and besides an extra store in the good version,
> which I think is just because of the unoptimized code, things look
> the same to me...
I suspect it's more likely the working version is not making
assumptions on the stack alignment, where the broken one is.
Andrew
More information about the freebsd-arm
mailing list