How to report bugs (Re: 6.2-STABLE deadlock?)

Fri Apr 27 19:03:27 UTC 2007

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --On Tuesday, April 24, 2007 23:53:16 -0400 Kris Kennaway
<kris at obsecurity.org> 
wrote:

> On Wed, Apr 25, 2007 at 10:53:08AM +0800, LI Xin wrote:
>> Hi, Oleg,
>>
>> Oleg Derevenetz wrote:
>> > ??????? LI Xin <delphij at delphij.net>:
>> [...]
>> >> I'm not very sure if this is specific to one disk controller.  Actually
>> >> I got some occasional reports about similar hangs on amd64 6.2-RELEASE
>> >> (slightly patched version) that most of processes stuck in the 'ufs'
>> >> state, under very light load, the box was equipped with amr(4) RAID.
>> >>
>> >> I was not able to reproduce the problem at my lab, though, it's still
>> >> unknown that how to trigger the livelock :-(  Still need some
>> >> investigate on their production system.
>> >
>> > I reported simular issue for FreeBSD 6.2 in audit-trail for kern/104406:
>> >
>> > http://www.freebsd.org/cgi/query-pr.cgi?pr=104406&cat=
>> >
>> > and there should be a thread related to this. Briefly, I suspects that
>> > this is  related to nullfs filesystems on my server and when I cvsuped to
>> > FreeBSD 6.2- STABLE with Daichi's unionfs-related patches and replaced
>> > nullfs-mounted fs  with unionfs-mounted (that was done 10.03.07) problem
>> > is gone (seems to be so,  at least).
>>
>> Hmm...  Seems to be different issues.  The problem I have received was a
>> pgsql server (no nullfs/unionfs involved), and the hang always happen
>> when it is not being heavily loaded (usually in the morning, for
>> instance, and there is no special configuration, like scheduled tasks
>> which can generate disk load, etc., only the entropy harvesting), so
>> this is quite confusing.
>
> Yes, a large part of the confusion is the unfortunate tendency of
> people to do the following:
>
> <user1> my system hangs/panics/etc
> <user2> my system hangs/panics/etc too; it must be the same problem!
>
> What we really need is for every FreeBSD user who encounters a
> hang/panic/etc to avoid jumping to conclusions -- no matter how many
> superficial similarities there may seem to you -- and instead go
> through the relevant steps described here:
>
>
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kernelde
> bug.html
>
> Until you (or a developer) have analyzed the resulting information,
> you cannot definitively determine whether or not your problem is the
> same as a given random other problem, and you may just confuse the
> issue by making claims of similarity when you are really reporting a
> completely separate problem.

What about those that don't have the benefit of being able to access the 
console? :(  I've recently started buying servers that have builtin, full 
remote console (ie. the HP servers), but, for instance, I have one box that I 
have to consistently reboot ever 3 days due to a 'No Buffer Space Available' 
...

A thought: how hard would it be to add some method of forcing a system crash, 
that would dump core, from the command line?  Something that, by default, would 
be disabled, but for remote debugging purposes, one could enable in the kernel 
and do a 'sysctl kernel.force_core_crash=1' to have it do it?  I imagine that 
having a core to analyze would allow providing more information then nothing at 
all, no?

- ----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy at hub.org                              MSN . scrappy at hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (FreeBSD)

iD8DBQFGMkj34QvfyHIvDvMRAnIsAJ42loBGh0TkX4mfWSrZrMq2FheBuQCgiu4l
B0PCLtLhd9ZiJ4oNLWZ6LT0=
=KK9Y
-----END PGP SIGNATURE-----