Complete hang on 9.0-RELEASE

Attilio Rao attilio at freebsd.org
Wed Feb 29 18:44:46 UTC 2012


2012/2/29, Arnaud Lacombe <lacombar at gmail.com>:
> Hi,
>
> On Wed, Feb 29, 2012 at 12:59 PM, Arnaud Lacombe <lacombar at gmail.com> wrote:
>> Hi,
>>
>> On Mon, Feb 27, 2012 at 12:48 PM, Arnaud Lacombe <lacombar at gmail.com>
>> wrote:
>>> Hi,
>>>
>>> On Mon, Feb 27, 2012 at 10:36 AM, Attilio Rao <attilio at freebsd.org>
>>> wrote:
>>>> 2012/2/27, Arnaud Lacombe <lacombar at gmail.com>:
>>>>> Hi,
>>>>>
>>>>> On Tue, Feb 14, 2012 at 11:41 AM, Arnaud Lacombe <lacombar at gmail.com>
>>>>> wrote:
>>>>>> Hi folks,
>>>>>>
>>>>>> For the records, I was running some tests yesterday on top of a
>>>>>> 9.0-RELEASE, amd64, kernel when the box hanged. At the time of the
>>>>>> hang, the box was running a process with about 2800 threads with heavy
>>>>>> IPC between 1400 writers and 1400 readers. The box was in single user
>>>>>> mode (/bin/sh coming from FreeBSD 7.4-STABLE). Here is the beginning
>>>>>> of the dmesg:
>>>>>>
>>>>> This happened a second time, now with FreeBSD 8.2-RELEASE. Complete
>>>>> machine hang. The machine was running about 4000 threads in a single
>>>>> process, all the other condition are the same.
>>>>
>>>> Arnaud,
>>>> can you please break in your kernel via KDB, collect the following
>>>> informations from the DDB prompt:
>>>> - ps
>>>> - alltrace
>>>> - show allpcpu
>>>> - possibly get a coredump with 'call doadump'
>>>>
>>> Will do, but I'll need to rebuild a kernel to include DDB.
>>>
>>>> and in the end provide all those along with kernel binary and possibly
>>>> sources somewhere?
>>>>
>>> I'll be testing a bare `release/8.2.0' with the following patch:
>>>
>>> diff --git a/sys/amd64/conf/GENERIC b/sys/amd64/conf/GENERIC
>>> index c3e0095..7bd997f 100644
>>> --- a/sys/amd64/conf/GENERIC
>>> +++ b/sys/amd64/conf/GENERIC
>>> @@ -79,6 +79,10 @@ options      INCLUDE_CONFIG_FILE     # Include this
>>> file in kernel
>>>
>>>  options        KDB           # Kernel debugger related code
>>>  options        KDB_TRACE     # Print a stack trace for a panic
>>> +options        DDB
>>> +options        BREAK_TO_DEBUGGER
>>> +options        ALT_BREAK_TO_DEBUGGER
>>>
>>>  # Make an SMP-capable kernel by default
>>>  options        SMP           # Symmetric MultiProcessor Kernel
>>>
>> ok, it happened again after 2 days, the process was running about 3200
>> threads. I'm trying to break into DDB and let you know, I'm not that
>> successful for now...
>>
> No luck. None of BREAK or ALT_BREAK are responding. I will not touch
> the system in the next few hours if you want me to test something on
> it. In the event of 8.2-RELEASE or 9.0-RELEASE are  not meant to work
> reliably on top of a 7.4-RELEASE userland, I will re-setup the test to
> occurs on a clean 9.0-RELEASE system and re-try.

We allow to break KBI when new releases happens, thus this may cause a
breakage for you, even if a deadlock is really not something you want.

Can you try enabling SW_WATCHDOG, DEADLKRES and possibly arm your ichwd?
if the breakage involves clocks or interrupt sources there are still
chances they will be able to catch it though.

However, it doesn't seem you are setup with a proper serial console?
If this is the case, you need to go with a textdump in order to
collect DDB output.
Or if you have it you might try with sending a serial break and kernel
should break in DDB.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein


More information about the freebsd-stable mailing list