Extending MADV_PROTECT

John Baldwin jhb at freebsd.org
Wed May 8 16:26:46 UTC 2013


On Wednesday, May 08, 2013 5:58:27 am Konstantin Belousov wrote:
> On Tue, May 07, 2013 at 02:33:27PM -0400, John Baldwin wrote:
> > One of the issues I have with our current MADV_PROTECT is that it
> > isn't very administrative-friendly. That is, as a sysadmin I can't
> > easily protect arbitrary processes from the OOM killer. Instead, the
> > binary has to be changed to invoke madvise(). Furthermore, once the
> > protection is granted it can't be revoked. Also, any binaries that
> > want this have to be run as root. Instead, I would like to be able
> > to both set and revoke this for existing processes and possibly even
> > allow it to be inherited (so I can tag a top-level daemon that forks
> > and have all its future children be protected for example). To that
> > end I've whipped up a simple patch (against 8, but should port to
> > HEAD easily if folks think it is a good idea) to add a new pprotect()
> > system call and userland program (protect) that can be used similar to
> > ktrace(1) either as a modifier when running a new program or as a tool
> > for setting or clearing protection for existing processes.
> >
> > The inherit feature isn't implemented yet, but it should be simple
> > to do. One would simply need a new flag that PPROT_INHERIT sets that
> > is checked on fork and propagates P_PROTECTED if it is set. Also,
> > one other thought I had is that at some point we might want to make
> > P_PROTECTED more fine-grained, e.g. by allowing for OOM "priorities".
> > To that end, it may make sense to add a new argument to protect,
> > though you could also reserve part of the 'op' parameter to encode a
> > priority.
> 
> Wouldn't the pprot_setchildren() miss a child for which the new pid and
> struct proc are already allocated in the do_fork(), but which is not yet
> linked into the process tree ?  If true, I think this does not
> fulfill the promise of the PPROT_DESCEND.

ktrace has the same issue, and really, this is just a race.  If the user
had run the command a few nanoseconds earlier the proc wouldn't be allocated
at all, and I doubt a user would notice the difference in those two cases.
If you are doing this programmatically then that is a race that the program
can handle.  It isn't any different from having a new process begin its
fork() a few nanoseconds after this returns either.  This is why if you
want that behavior you would use -di (and applies equally to ktrace).

> It seems that the patch posted missed the chunk for sys/proc.h.
> For HEAD, you probably need e.g. p_flag2 and P2_PROTECTED instead.

P_PROTECTED already exists on both HEAD and 8.x (the MADV_PROTECT it is
extending is quite old).  What we would need a p_flag2 for would be to
add a new P_PROTECTED_INHERIT if we decided we wanted to support that.

> Since the syscall is mean to be extended in the future, would it make
> more sense to add a multiplexer, e.g. procctl(2), one operation of which
> would be PROCCTL_PROTECT ?

Do we expect it to do more than adjust protection?  We already have a few
other process-control system calls (e.g. ptrace()).  It's hard to ensure
it is sufficiently generic when only abstracting from one use case.

-- 
John Baldwin


More information about the freebsd-arch mailing list