Complete hang on 9.0-RELEASE

Attilio Rao attilio at freebsd.org
Mon Mar 5 18:50:29 UTC 2012


2012/3/5, Arnaud Lacombe <lacombar at gmail.com>:
> Hi,
>
> On Wed, Feb 29, 2012 at 2:31 PM, Arnaud Lacombe <lacombar at gmail.com> wrote:
>> Hi,
>>
>> On Wed, Feb 29, 2012 at 2:22 PM, Attilio Rao <attilio at freebsd.org> wrote:
>>> 2012/2/29, Arnaud Lacombe <lacombar at gmail.com>:
>>>> Hi,
>>>>
>>>> On Wed, Feb 29, 2012 at 1:44 PM, Attilio Rao <attilio at freebsd.org>
>>>> wrote:
>>>>> 2012/2/29, Arnaud Lacombe <lacombar at gmail.com>:
>>>>>> Hi,
>>>>>>
>>>>>> On Wed, Feb 29, 2012 at 12:59 PM, Arnaud Lacombe <lacombar at gmail.com>
>>>>>> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Mon, Feb 27, 2012 at 12:48 PM, Arnaud Lacombe <lacombar at gmail.com>
>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> On Mon, Feb 27, 2012 at 10:36 AM, Attilio Rao <attilio at freebsd.org>
>>>>>>>> wrote:
>>>>>>>>> 2012/2/27, Arnaud Lacombe <lacombar at gmail.com>:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> On Tue, Feb 14, 2012 at 11:41 AM, Arnaud Lacombe
>>>>>>>>>> <lacombar at gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>> Hi folks,
>>>>>>>>>>>
>>>>>>>>>>> For the records, I was running some tests yesterday on top of a
>>>>>>>>>>> 9.0-RELEASE, amd64, kernel when the box hanged. At the time of
>>>>>>>>>>> the
>>>>>>>>>>> hang, the box was running a process with about 2800 threads with
>>>>>>>>>>> heavy
>>>>>>>>>>> IPC between 1400 writers and 1400 readers. The box was in single
>>>>>>>>>>> user
>>>>>>>>>>> mode (/bin/sh coming from FreeBSD 7.4-STABLE). Here is the
>>>>>>>>>>> beginning
>>>>>>>>>>> of the dmesg:
>>>>>>>>>>>
>>>>>>>>>> This happened a second time, now with FreeBSD 8.2-RELEASE.
>>>>>>>>>> Complete
>>>>>>>>>> machine hang. The machine was running about 4000 threads in a
>>>>>>>>>> single
>>>>>>>>>> process, all the other condition are the same.
>>>>>>>>>
>>>>>>>>> Arnaud,
>>>>>>>>> can you please break in your kernel via KDB, collect the following
>>>>>>>>> informations from the DDB prompt:
>>>>>>>>> - ps
>>>>>>>>> - alltrace
>>>>>>>>> - show allpcpu
>>>>>>>>> - possibly get a coredump with 'call doadump'
>>>>>>>>>
>>>>>>>> Will do, but I'll need to rebuild a kernel to include DDB.
>>>>>>>>
>>>>>>>>> and in the end provide all those along with kernel binary and
>>>>>>>>> possibly
>>>>>>>>> sources somewhere?
>>>>>>>>>
>>>>>>>> I'll be testing a bare `release/8.2.0' with the following patch:
>>>>>>>>
>>>>>>>> diff --git a/sys/amd64/conf/GENERIC b/sys/amd64/conf/GENERIC
>>>>>>>> index c3e0095..7bd997f 100644
>>>>>>>> --- a/sys/amd64/conf/GENERIC
>>>>>>>> +++ b/sys/amd64/conf/GENERIC
>>>>>>>> @@ -79,6 +79,10 @@ options      INCLUDE_CONFIG_FILE     # Include
>>>>>>>> this
>>>>>>>> file in kernel
>>>>>>>>
>>>>>>>>  options        KDB           # Kernel debugger related code
>>>>>>>>  options        KDB_TRACE     # Print a stack trace for a panic
>>>>>>>> +options        DDB
>>>>>>>> +options        BREAK_TO_DEBUGGER
>>>>>>>> +options        ALT_BREAK_TO_DEBUGGER
>>>>>>>>
>>>>>>>>  # Make an SMP-capable kernel by default
>>>>>>>>  options        SMP           # Symmetric MultiProcessor Kernel
>>>>>>>>
>>>>>>> ok, it happened again after 2 days, the process was running about
>>>>>>> 3200
>>>>>>> threads. I'm trying to break into DDB and let you know, I'm not that
>>>>>>> successful for now...
>>>>>>>
>>>>>> No luck. None of BREAK or ALT_BREAK are responding. I will not touch
>>>>>> the system in the next few hours if you want me to test something on
>>>>>> it. In the event of 8.2-RELEASE or 9.0-RELEASE are  not meant to work
>>>>>> reliably on top of a 7.4-RELEASE userland, I will re-setup the test to
>>>>>> occurs on a clean 9.0-RELEASE system and re-try.
>>>>>
>>>>> We allow to break KBI when new releases happens, thus this may cause a
>>>>> breakage for you, even if a deadlock is really not something you want.
>>>>>
>>>>> Can you try enabling SW_WATCHDOG, DEADLKRES and possibly arm your
>>>>> ichwd?
>>>>> if the breakage involves clocks or interrupt sources there are still
>>>>> chances they will be able to catch it though.
>>>>>
>>>>> However, it doesn't seem you are setup with a proper serial console?
>>>> The serial console is working definitively fine. I can break into DDB
>>>> at will when the test is running. I did not test with ALT_BREAK
>>>> per-se, but BREAK does work.
>>>
>>> So if you try to break in DDB via serial break it doesn't work?
>>> That is definitively very bad...
>>>
>> just to be sure, I rebooted the system and I could break into DDB at
>> the first attempt with ALT_BREAK, BREAK was a bit more reluctant but
>> worked too. So yes, this does not taste good :/
>>
>>> Can you try with the options I mentioned earlier and see if something
>>> changes?
>>>
>> will do, but I will first attempt to reproduce this on 9.0-RELEASE.
>>
> 9.0-RELEASE (kernel + userland) hanged today while running 2000
> threads. Next step is to reproduce it with a watchdog+textdump enabled
> kernel.

And you were still unable to break in DDB, right?

Attilio


-- 
Peace can only be achieved by understanding - A. Einstein


More information about the freebsd-stable mailing list