Someone help me understand this...?

Thu Aug 28 14:04:12 PDT 2003

> On Thu, 28 Aug 2003, Joe Greco wrote:
> > > On Wed, 27 Aug 2003, Joe Greco wrote:
> > > > The specific OS below is 5.1-RELEASE but apparently this happens on 4.8
> > > > as well. 
> > > 
> > > Could you confim this happens with 4.8?  The access control checks there
> > > are substantially different, and I wouldn't expect the behavior you're
> > > seeing on 4.8...
> > 
> > Rather difficult.  I'll see if the client will let me trash a production
> > system, but usually people don't like $40K servers handing out a few
> > hundred megabits of traffic going out of service.  We were trying to fix
> > it on the scratch box (which happens to have 5.1R on it) and then were
> > going to see how it fared on the production systems. 
> 
> I think it's safe to assume that if you're seeing a similar failure,
> there's a different source given my reading of the code, but I'm willing
> to be proven wrong.  It's probably not worth the investment if you're
> talking about large quantities of money, though.

It's more like "large quantities of annoyance and work".  Can you describe
the case you're envisioning?  If I can easily poke at it, I can at least
get some clues.

> > > Clearly, unbreaking applications like Diablo by default is desirable.  At
> > > least OpenBSD has similar protections to these turned on by default, and
> > > possibly other systems as well.  As 5.x sees more broad use, we may well
> > > bump into other cases where applications have similar behavior: they rely
> > > on no special protections once they've given up privilege.  I wonder if
> > > Diablo can run unmodified on OpenBSD; it could be they don't include
> > > SIGALRM on the list of "protect against" signals, or it could be that they
> > > modify Diablo for their environment to use an alternative signaling
> > > mechanism.  Another alternative to this patch would simply be to add
> > > SIGARLM to the list of acceptable signals to deliver in the
> > > privilege-change case.
> > 
> > I wonder if it would be reasonable to have some sort of interface that
> > allowed a program to tell FreeBSD not to set this flag...  if not, at
> > least if there was a sysctl, code could be added so that the daemon
> > checked the flag when starting and errored out if it wasn't set. 
> 
> We actually have such an interface, but it's only enabled for the purposes
> of regression testing.  If you compile "options REGRESSION" into the
> kernel configuration, a new system call __setsugid(), is exposed to
> applications.  It's used by src/tools/regression/security/proc_to_proc to
> make it easier to set up process pairs for regression testing of
> inter-process access control.  When I added it, there was some interest in
> just making it setsugid() and exposing it to all processes.  Maybe we
> should just go this route for 5.2-RELEASE.  Invoking it with a (0)
> argument would mean the application writer accepted the inherrent risks.
> 
> However, this would open the application to the risks of debugging
> attachment, which are probably greater than the signal risks in most
> cases.  It's not clear what the best way to express "I want to accept
> <these risks> but not <those risks>" would be...  So far, it sounds like
> we have three work-arounds in the pot, perhaps we can think of something
> better:
> 
> (1) Remove SIGALRM from the list of prohibited signals in the P_SUGID
>     case.  Not clear what the risks are here based on common application
>     use, but this is an easy change to make.
> 
> (2) Add setsugid() to allow applications to give up implicit protections
>     associated with credential changes.  This comes with greater risks, I
>     suspect, since it opens up applications to more explicit
>     vulnerabilities:  signal attacks require more sophistication and luck,
>     but debugging attacks are "easy".
> 
> (3) Allow administrators to selectively disable the more restrictive
>     signal checks at a system scope using a sysctl.  This is easy, and
>     comes with no risks as long as the setting is unchanged (the default
>     in the patch I sent out earlier). 
> 
> I'm tempted to commit (1) immediately to allow a workaround if we get
> nothing else figured out, and to think some more about (2) and (3).
> Another possibility would be to encourage application writers to avoid
> overloading signals that already have "meanings", and rely on the USR
> signals.  I assume the reason Diablo uses ALRM is that the USR signals
> already have assigned semantics?

Correct.  The USR signals control debug levels.  If it was a signal that
was only used internally, it could be changed, of course, but changing a
signal used by humans (and one used in the same manner as other programs)
is probably a bad idea.

> > > BTW, it's worth noting that the mechanism Diablo is using to give up
> > > privilege actually does retain some "privileges" -- it doesn't, for
> > > example, synchronize its resource limits with those of the user it is
> > > switching to, so it retains the starting resource limits (likely those of
> > > the root account). 
> > 
> > That's actually preferred in most cases.  News servers almost always eat
> > far more resources than whatever limits you might set by default, which
> > just turns into telling people to remove the limits or use root's
> > limits.  Generally if a news package bumps limits bad things happen. 
> 
> Right now, most applications in the base system make use of the
> setusercontext() call to modify their protections as part of a switch of
> users.  They often pass in the flag LOGIN_SETALL and then remove the bits
> they don't need, such as LOGIN_SETRESOURCES.  This also has the side
> effect of setting up things like the umask based on the user default in
> login.conf, setting the default paths, etc.  This may be overkill for what
> you're looking for, though, and there's a lot of value to "if it ain't
> broke, don't fix it". 

Yeah, if anything, we probably don't want to do that, because the resources
set up as root are usually more attractive.  I don't have a problem with
coding in some FreeBSD-isms, but I don't see it as buying us anything, does
it?

> > > A preferred structuring of privilege separation
> > > attempts to avoid this scenario by containing privilege in a process that
> > > is as independent as possible from the unprivileged processes, and uses
> > > file descriptor passing to get a bound port to the unprivileged processes,
> > > rather than credential manipulation which is fairly failure-prone.  
> > 
> > Yes, and such a thing is actually available, though it introduces some
> > new issues, because the daemons can be configured to allow various bound
> > ports (needing a variable number of fd's, etc) and this also breaks
> > legacy sites where people have custom startup scripts.  Ugh.  We did
> > that originally so people could get core dumps on FreeBSD.
> 
> Yeah.  The point on application behavior is probably to affect future
> application development and changes -- we still need to address current
> configurations.
> 
> > Yeah, yeah, it's Matt Dillon legacy code.  Matt tended to ignore error
> > returns from things where an error was not expected and even if one was
> > reported, nothing (beyond a message) could be done.  It actually took me
> > a while to isolate the kill issue as a result, because...  the rval from
> > kill was being ignored (now the error gets syslog'ed). 
> 
> In most cases, fail-stop is a reasonable behavior for unexpected security
> behavior from the system, but ignore is likely to shoot you later. :-) 

I don't even care about fail-stop.  I'd be happy with "cry-a-lot".  I'm a
big boy and am actually capable of looking in log files, especially when
things aren't working.  Heh.

> I
> tend to wrap even kill() calls as uid 0 in an assertion check, just to be
> on the safe side.  If nothing else, it helps detect the case where the
> other process has died, and you're using a stale pid.  It's particular
> useful if the other process has died, the pid has been reused, and it's
> now owned by another user, which is a real-world case where kill() as a
> non-0 uid can fail even when you're sure it can't :-). 

Well, okay, I see the paranoia, but for a news server which tends to be a
dedicated machine, I'm willing to bet on the unlikelihood of pid reuse in
the fraction of a second between a wait pid-list-update and a failed kill
attempt.  ;-)  There's nothing you can do other than to log the error and
scratch your head anyways, unless I misunderstood the scenario you're
drawing.

... JG
-- 
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.