Random Kernel Panic on Dreamplug (FS related)
Ronald Klop
ronald-lists at klop.ws
Sun May 17 19:42:16 UTC 2015
On Sun, 17 May 2015 00:16:23 +0200, Ian Lepore <ian at freebsd.org> wrote:
> On Tue, 2014-09-30 at 16:29 +0200, Mattia Rossi wrote:
>> Am 30.09.2014 16:19, schrieb Ian Lepore:
>> > On Tue, 2014-09-30 at 16:05 +0200, Mattia Rossi wrote:
>> >> Am 30.09.2014 14:30, schrieb John-Mark Gurney:
>> >>> Mattia Rossi wrote this message on Tue, Sep 30, 2014 at 14:14 +0200:
>> >>>> Am 30.09.2014 13:29, schrieb John-Mark Gurney:
>> >>>>> Mattia Rossi wrote this message on Mon, Sep 29, 2014 at 10:42
>> +0200:
>> >>>>>> Am 29.09.2014 06:01, schrieb John-Mark Gurney:
>> >>>>>>> Mattia Rossi wrote this message on Fri, Sep 26, 2014 at 14:19
>> +0200:
>> >>>>>>>> This might be part of the weird FFS issues the Dreamplug has
>> and no-one
>> >>>>>>>> knows why they're happening.
>> >>>>>>> Are you running w/ FFS journaling? If so, try turning it off,
>> but
>> >>>>>>> keeping softupdates on..
>> >>>>>> No journaling, no softupdates. I'll try enabling softupdates
>> next time.
>> >>>>>> don't know if it will panic though
>> >>>>>>>> data_abort_handler() at data_abort_handler+0x5c0
>> >>>>>>>> pc = 0xc0de7a28 lr = 0xc0dd711c (exception_exit)
>> >>>>>>>> sp = 0xde019898 fp = 0xde019a20
>> >>>>>>>> r4 = 0xffffffff r5 = 0xffff1004
>> >>>>>>>> r6 = 0xc3f3f6c0 r7 = 0x00001000
>> >>>>>>>> r8 = 0xc443e880 r9 = 0x00000000
>> >>>>>>>> r10 = 0xc3d69000
>> >>>>>>>> exception_exit() at exception_exit
>> >>>>>>>> pc = 0xc0dd711c lr = 0xc0d53828
>> (ffs_truncate+0xaa8)
>> >>>>>>>> sp = 0xde0198e8 fp = 0xde019a20
>> >>>>>>>> r0 = 0xd0238120 r1 = 0x00000e60
>> >>>>>>>> r2 = 0x00000000 r3 = 0x00000000
>> >>>>>>>> r4 = 0x00000120 r5 = 0x00000000
>> >>>>>>>> r6 = 0xc3f3f6c0 r7 = 0x00001000
>> >>>>>>>> r8 = 0xc443e880 r9 = 0x00000000
>> >>>>>>>> r10 = 0xc3d69000 r12 = 0xd0238120
>> >>>>>>>> memset() at memset+0x48
>> >>>>>>>> pc = 0xc0de521c lr = 0xc0d53828
>> (ffs_truncate+0xaa8)
>> >>>>>>>> sp = 0xde0198e8 fp = 0xde019a20
>> >>>>>>>> Unwind failure (no registers changed)
>> >>>>>>> No more beyond this? If you could run addr2line on 0xc0d53828
>> so
>> >>>>>>> that we know where in ffs_truncate it's failing, that'd be very
>> >>>>>>> nice...
>> >>>>>> So I was trying to save the coredump in order to reboot and run
>> >>>>>> addr2line, but that failed:
>> >>>>>>
>> >>>>>> Physical memory: 504 MB
>> >>>>>> Dumping 67 MB:(da0:umass-sim0:0:0:0): WRITE(10). CDB: 2a 00 01
>> d5 1f20
>> >>>>>> 00 00 01 00 <sip:2000000100>
>> >>>>>> (da0:umass-sim0:0:0:0): CAM status: Resource Unavailable
>> >>>>>> (da0:umass-sim0:0:0:0): Error 5, Retries exhausted
>> >>>>>> Aborting dump due to I/O error.
>> >>>>>>
>> >>>>>> ** DUMP FAILED (ERROR 5) **
>> >>>>>>
>> >>>>>> So I guess this error is related to the CAM errors I'm getting
>> from time
>> >>>>>> to time. I was hoping that those errors were related to the
>> INVARIANTS
>> >>>>>> option that slowed down the system and thus might have triggered
>> CAM
>> >>>>>> errors, but obviously the SD Card seems to be the real issue
>> here.
>> >>>>>> So no crashdump for further analysis.
>> >>>>> That's fine.. w/ the addr2line we have some lines to explore...
>> >>>>>
>> >>>>>> Interestingly the CAM errors didn't show up on the terminal as
>> other
>> >>>>>> times, the kernel just panicked straight away.
>> >>>>> Hmm.. that is odd.. someone who knows the SD card layer should
>> look
>> >>>>> at this part... It could be that the SD card driver doesn't
>> handle
>> >>>>> dumping (there is this global flag that gets set) properly and
>> the driver
>> >>>>> needs to behave differently when it's set...
>> >>>> I also need to grab a new SD card, just to make sure it's really
>> not the
>> >>>> card.
>> >>>>
>> >>>>>> But I've got the addr2line output, even though I'm not sure it
>> makes any
>> >>>>>> difference:
>> >>>>>>
>> >>>>>> addr2line -f -e /mnt/kernel.debug 0xc0d53828
>> >>>>>>
>> >>>>>> ffs_truncate
>> >>>>>> /usr/devel/dreamplug/sys/ufs/ffs/ffs_inode.c:321
>> >>>>> can you give me the contents of the line? and a few lines of
>> context
>> >>>>> around it? In HEAD's source, this is DOINGASYNC, and there is
>> no call
>> >>>>> to memset, nor a variable assignment that would result in memset
>> being
>> >>>>> called...
>> >>>> Same here.. The file hasn't been changed in a while (Fri, 31 May
>> 2013):
>> >>>>
>> >>>> ip->i_size = length;
>> >>>> DIP_SET(ip, i_size, length);
>> >>>> if (bp->b_bufsize == fs->fs_bsize)
>> >>>> bp->b_flags |= B_CLUSTEROK;
>> >>>> if (flags & IO_SYNC)
>> >>>> bwrite(bp);
>> >>>> 321: else if (DOINGASYNC(vp))
>> >>>> bdwrite(bp);
>> >>>> else
>> >>>> bawrite(bp);
>> >>>> ip->i_flag |= IN_CHANGE | IN_UPDATE;
>> >>>> return (ffs_update(vp, !DOINGASYNC(vp)));
>> >>>>
>> >>>> No idea what's going on.
>> >>> ok, could you send me the output of objdump -dSl, but you only need
>> >>> to include the part from XXXXX <ffs_truncate>: to the next
>> XXX<func>:
>> >>> line... probably off list as it'll be quite long...
>> >> I'm sorry, but given that I just broke all my working worlds using
>> fsck,
>> >> I'm not going to be able to do that until I'm back from holidays....
>> >> currently working on the stuff remotely and after today's work day,
>> I'm
>> >> not going to be able to get my hands on the dreamplug.
>> >>
>> >>
>> > BTW, for anyone playing with this problem, step one is to edit
>> > your /etc/fstab and set the fsck pass number to 0 for all filesystems.
>> > There's a risk of filesystem corruption after a crash, but it's
>> smaller
>> > than the 100% corruption rate of letting fsck run. :)
>> >
>> Of course! Great idea :-) Sometimes just can't think of the right tweak
>> to save a lot of pain...
>>
>> Anyhow, I just found out, that I was rebooting the dreamplug from the sd
>> card instead of the usb stick the whole time, and the usb stick hasn't
>> been damaged enough by fsck, so it actually booted :-) I'll send the
>> objdump soon.
>
> A (very) late update on this.... It looks like we may have tracked the
> change that started all this down to the introduction of unmapped IO,
> almost 2 years ago now. I still can't find the root cause, but I think
> disabling unmapped IO on armv4/5 is a viable workaround, which Warner
> committed this morning as r283014.
>
> --Ian
This sounds promising for the use of my Sheevaplugs.
I will try this soon. Thanks.
Ronald.
More information about the freebsd-arm
mailing list