Debugging RLIMITs signals: SIGXFSZ and SIGXCPU

Mon Apr 12 15:58:11 UTC 2010

All:

   I've got a process that is mysteriously receiving a SIGTERM (or other
   signal.  It's a RADIUS daemon; runs a non-Root (not privsep,
   unfortunately).  Identical hardware, identical code, identical
   config on 6.3-PL is fine.

   On 8, the daemon is logging receipt of a non-HUP signal and
   exiting out.

   Our best theory at the moment are changes in default RLIMITs
   between RELENG_6and RELENG_8.

   For example:
   6.3:
   open files                      (-n) 11095
   8:
   open files                      (-n) 3520

   Either that, or a memory/file handler/other leak that only
   manifests in RELENG_8.

   Either way, I'd like to debug the kernel handling of RLIMITs.

   The best I can find are references to:

   /usr/src/sys/kern/kern_resource.c::lim_cb() to SIGXCPU for RLIMIT_CPU
   /usr/src/sys/ufs/ffs/ffs_vnops.c::ffs_write() to SIGXFSZ or
     ... RLIMIT_FSIZE

   Not sure about RLIMIT_RSS, RLIMIT_AS, RLIMIT_NOFILE or others.

   Unfortunately, in the two places I see, the call 'psignal()' is
   used in leui of 'killproc()' to pass those custom RLIMIT's
   related signals and psignal() doesn't have any logging like
   killproc().

   It would be really nice if there could be some standardized
   logging for RLIMIT* related resource exhaustion.

   For example:
   /usr/src/sys/vm/vm_pageout.c: killproc(bigproc, "out of swap space");

  So my question are:

  1) Anyone else interested in having this "feature" (RLIMIT
     debugging, possibly a sysctl(3))?
  2) Does anyone have any idea how other RLIMIT_ exhaustion is
     handled?  A lot of other checks in the code in
     kernel_resource.c seems to 'return (error);' on resource
     exhaustion.

Thanks,  ~BAS