svn commit: r212964 - head/sys/kern

Wed Sep 22 21:53:48 UTC 2010

On Wednesday, September 22, 2010 5:15:25 pm Ken Smith wrote:
> On 9/22/10 5:02 PM, Andriy Gapon wrote:
> > on 22/09/2010 22:58 John Baldwin said the following:
> >
> >> Agreed.  FWIW, I actually think that this is the only change needed as
> >> crashinfo is enabled by default in 8.x and later.  We already include symbols
> >> in kernels by default now, so just setting dumpdev will give you the same
> >> info you generally can get from a textdump in the form of a simple
> >> /var/crash/core.txt.N file.
> >>
> >> The other benefit of full crashdumps + crashinfo as compared to textdumps is
> >> that a developer can request further information in a PR followup (fire up
> >> kgdb and enter command 'X' and reply with the output).  With a textdump any
> >> info not collected by the textdump is lost once the machine reboots after the
> >> crash.
> >
> > Agree++
> > But what was the reason that dumpdev="AUTO" was reverted?
> > I remember that POLA was quoted at the time.
> > I am not sure what the astonishment actually was - perhaps 'AUTO' was not smart
> > enough and destroyed somebody's data?
> >
> 
> 
> Not everybody would notice /var getting full of crash dumps.
> Picture a server farm where for the most part the machines
> are all just plain on auto-pilot.  If one or several develop
> a problem that causes panic's /var can become full and possibly
> cause the machine to stop doing something important (between
> panic's...).  I wasn't around when the initial decision for
> what to have it set to was made but this was the reason for
> me starting to do it again when I realized I forgot to at
> least once, and hence the reference to POLA.
> 
> Crash dumps are good for individual workstations.  Crash
> dumps are good for servers *if* the admin knows they're
> having a problem and is actively working on that server
> to resolve the issue.  But they're no so good and can
> cause nasty side-effects if they're happening on a machine
> not being watched over closely.  That's the reason for
> the change in setting when a -stable branch gets started.

FWIW, the Y! version of crashinfo auto-deletes crash dumps based on the
available disk space for precisely this reason.  With that addition
crashinfo works quite well on a very large server farm.

-- 
John Baldwin