FYI: various 11.0-CURRENT -r293227 (and older) hangs on arm (rpi2): a description of sorts
Mark Millard
markmi at dsl-only.net
Thu Jan 7 22:16:19 UTC 2016
I'm top posting this change of information about the hang status seen via gstat:
After a long time the gstat -cod is showing a non-zero value in one place:
L(q) for md0 is showing 4 now.
(I've no clue when it changed. I do not expect that I missed the 4 before.)
md0 is for the file-system based page file. That file is on the SSD, not the sdcard.
===
Mark Millard
markmi at dsl-only.net
On 2016-Jan-7, at 2:04 PM, Mark Millard <markmi at dsl-only.net> wrote:
>
> On 2016-Jan-7, at 1:31 PM, Hans Petter Selasky <hps at selasky.org> wrote:
>>
>> On 01/07/16 22:26, Hans Petter Selasky wrote:
>>> On 01/07/16 21:20, Mark Millard wrote:
>>>>
>>>> On 2016-Jan-7, at 12:04 PM, Hans Petter Selasky <hps at selasky.org>
>>>> wrote:
>>>>>
>>>>> On 01/07/16 20:48, Ian Lepore wrote:
>>>>>> If the filesystems and swap space are on a usb drive, then maybe it's
>>>>>> the usb subsystem that's hanging. The wait states you showed for those
>>>>>> processes are consistant with what I've seen when all buffers get
>>>>>> backed up in a queue on one non-responsive or slow device. It may be
>>>>>> that there's a way to get the system deadlocked when it's low on
>>>>>> buffers and there is memory pressure causing the swap to be used (I
>>>>>> generally run arms systems without any swap configured).
>>>>>>
>>>>>> Running gstat in another window while this is going on may give you
>>>>>> some insight into the situation. Beyond that I don't know what to look
>>>>>> at, especially since you generally can't launch any new tools once the
>>>>>> system gets into this kind of state.
>>>>>>
>>>>>> -- Ian
>>>>>
>>>>> Hi,
>>>>>
>>>>> All USB transfers towards disk devices have timeouts, so if something
>>>>> is hanging at USB level, you'll get a printout eventually.
>>>>
>>>> What sort of timescale after deadlock/live-lock is observed to
>>>> apparently have started does one have to wait in order to conclude
>>>> that the timeouts would have happened and so they do not apply to the
>>>> deadlock/live-lock?
>>>>
>>>>> The USB kernel processes needed for doing I/O transfers are not
>>>>> pinned to RAM. Can it happen if a USB process is swapped to disk,
>>>>> that the system cannot wakeup a swapped out process to get more swap?
>>>>>
>>>>> --HPS
>>>>
>>>
>>> Hi,
>>>
>>>> Wow. Could I use ddb to somehow check on the "USB kernel processes"
>>>> swap status when the overall context is deadlocked/live-locked?
>>>
>>> Are you able to run something like:
>>>
>>> ps auxwwH | grep usb
>>>
>>>> If yes, how? Otherwise something in top or some such display that I'd
>>> left running over the serial console would have to present useful
>>> information on the subject. Is there anything that would?
>>>
>>
>> Are you able to SSH into the box or ping it?
>>
>> --HPS
>
> Once the live-lock condition is reached no new processes can be created as far as I can tell: the attempt will hang any process that attempts the creation.
>
> I'd need "ps auxwwH" to be internally repeating to even get that much: I'd have to start it before the live-lock happened and it would have to be still running when the hang occurs, no on-going process creations involved.
>
> I'm not so sure that two communicating processes (ps and grep over a pipe) would work but I can not get to even one new process so far.
>
> ssh sessions also hang, input and output stop for them fairly generally. (Sometimes the context is such that ^t still works but shows no progress in what it reports.) No new ssh connections are possible: "Operation timed out".
>
> ping does respond normally: it is more of a live-lock status then a true deadlock one overall.
>
> The serial console still outputs what it was already running if that process does nothing that locks up. Changing what it is doing generally locks it up too.
>
> Doing something like unplugging a usb keyboard or mouse or plugging one in does show the expected messages via the console: it is more of a live-lock status then a true deadlock one overall.
>
> I can get to ddb after the hang. But I do not know what I'd do with it to find any useful information.
>
>
> As noted in another message: I used gstat instead of top on the serial console:
>
>> gstat shows everything zero during a hang, even L(q) column. (Length of queue?)
>>
>> I used:
>>
>> gstat -cod
>>
>> and had it running over the serial console port during the attempted portmaster activity.
>
>
===
Mark Millard
markmi at dsl-only.net
More information about the freebsd-arm
mailing list