Kernel crashes after sleep: how to debug?
John Baldwin
jhb at freebsd.org
Fri Jul 19 21:04:55 UTC 2013
On Friday, July 19, 2013 3:32:43 pm Yuri wrote:
> On 07/19/2013 08:00, John Baldwin wrote:
> > Well, you can probably find the value of 'm' in a register if you look at
the
> > dissassembly around the fault. You can then cast that pointer to the
right
> > type and print its contents.
>
> Here is the value of *m in frame 8:
> (kgdb) p *(struct vm_page*)0xfffffe00b460abf8
> $3 = {pageq = {tqe_next = 0xfe26, tqe_prev = 0xfffffe00b5a124d8}, listq
> = {tqe_next = 0xfffffe0081ad8f70, tqe_prev = 0xfffffe0081ad8f78},
> left = 0x6, right = 0xd00000201, object = 0x100000000, pindex =
> 4294901765, phys_addr = 18446741877712530608, md = {pv_list = {
> tqh_first = 0xfffffe00b460abc0, tqh_last = 0xfffffe00b5579020}, pat_mode
> = -1268733096}, queue = 72 'H', segind = -85 '�',
> hold_count = -19360, order = 0 '\0', pool = 254 '�', cow = 65535,
> wire_count = 0, aflags = 0 '\0', flags = 0 '\0', oflags = 0,
> act_count = 0 '\0', busy = 176 '�', valid = 208 '�', dirty = 126 '~'}
Hmm, that definitely looks like garbage. How are you with gdb scripting?
You could write a script that walks the PQ_ACTIVE queue and see if this
pointers ends up in there. It would then be interesting to see if the
previous page's next pointer is corrupted, or if the pageq.tqe_prev references
that page then it could be that this vm_page structure has been stomped on
instead.
Ultimately I think you will need to look at any malloc/VM/page operations
done in the suspend and resume paths to see where this happens. It might
be slightly easier if the same page gets trashed every time as you could
print out the relevant field periodically during suspend and resume to
narrow down where the breakage occurs.
--
John Baldwin
More information about the freebsd-hackers
mailing list