Bug in #! processing - One More Time

Garance A Drosihn drosih at rpi.edu
Wed Feb 23 22:24:59 PST 2005

Sometimes it's the simplest little changes which can suck the
life out of you...  I am aware that this is a trivial issue,
but now that I've figured out what is really going on, I am
not sure what the "best" fix would be.

To recap some history:

a) In Jan 2000, someone sent in a PR that perl documentation
    (including the famous "Camel" book from O'Reilly) claims
    that users can start a script with the line:

         #!/bin/sh -- # -*- perl -*- -p

    to avoid a variety of issues when writing cross-platform
    scripts.  Ignore the question of "but why?" for the moment,
    it *is* documented by perl (and in books on some other
    scripting languages).  He proposed a fix, and that was
    committed to src/sys/kern/imgact_shell.c as revision 1.21
    back in Feb 15 2000 (predating 4.0-release).  It was MFC'ed
    into release 3.5 on March 20, 2000.

    The PR is:

       NOTE: People *do* use this "feature".
    Counter: This feature doesn't actually work on recent
             releases of Redhat Linux.  I don't know about
             other linuxes.

b) In 2002, some other user updated that PR saying that the
    new behavior wasn't quite right either.  I assume nothing
    much was done at the time, but he spent time to collect
    a lot of details (which will be given below).

c) In 2004, after 5.3-release, the issue came up again.  I assume
    that is in another PR, but I haven't checked.  In any case,
    kern/imgact_shell.c was changed to remove that special
    processing for '#, after discussion in -current.  The change
    was committed to HEAD (6.x) on October 31st as revision 1.27.
    It was MFC'ed to 5.3-stable on November 8th.

    This broke scripts which depended on the special-handling of
    '#', but the conclusion in -current was that /bin/sh should
    handle such processing (if it wanted to), and not execve().

d) In January I was finally bitten by this running 6.x-current,
    and a friend of mine happened to get hit by it at the same
    time running 5.3-stable.  So I wrote up a quick fix and did
    some minimal testing.  I posted that to -current on Jan 31st,
    but I didn't want to commit it until I did more testing,
    which I wanted to do *after* I brought my systems up-to-date.

e) On January 29th, sobomax committed an "unrelated" fix to
    kern/imgact_shell.c, except that it just happened to bring back
    the special '#' processing which had been removed in October...

f) I update my systems, do extensive testing of my patch, and I
    committed it once I was confident it worked in all situations.
    However, I didn't notice that the shell was no longer even
    *seeing* the parameters after '#' (I had tested that part
    back in #d), so it turns out the key loop I that had added
    was never actually getting triggered.

    I committed it to 6.x-current last week.

g) On Monday I get ready to MFC the change to 5.3 (ahead of the
    rush to beat the code-freeze!).   But... the damn thing does
    NOT work right in some common situations!!  WTF?!?

So, I figure out all the above history, and I locally modify
kern/imgact_shell.c to again remove the special '#'-processing.
I go to fix my patch to /bin/sh, and I realize...

There is no simple, "make everyone happy" fix for it.  Sigh.

The problem is in the way the execve() system call passes all
arguments to the shell.  Given a shell named /tmp/list_args.pl,
which starts out as:
     #!/bin/sh -x -- # -*- perl -*- -p

and is executed via:
     /tmp/list_args.pl aaa bbb

What /bin/sh sees for arguments are:
      arg[0] == '-x'
      arg[1] == '--'
      arg[2] == '#'
      arg[3] == '-*-'
      arg[4] == 'perl'
      arg[5] == '-*-'
      arg[6] == '-p'
      arg[7] == '/tmp/list_args.pl'
      arg[8] == 'aaa'
      arg[9] == 'bbb'

The problem is that /bin/sh has no way of knowing where the
"shebang-line options" end, and the "command-line options" start.
(or does it?  I couldn't think of any reliable way, given that
the '#' could be followed by any totally arbitrary strings).

Going back to the follow-up to PR 16393, part of the challenge
with fixing this is that many other OS's do *not* break up the
options on the shebang line the way FreeBSD does.
 From the PR:

     Given a file called '/tmp/x2' with shebang line:
     #!/tmp/interp -a -b -c #dee eee

     If /tmp/x2 is exec'd, the operating system runs /tmp/interp
     with the following arguments:

     Solaris 8:
          args: "/tmp/interp" "-a" "/tmp/x2"

     Tru64 4.0:
          args: "interp" "-a -b -c #dee eee" "/tmp/x2"

     FreeBSD 2.2.7:
          args: "/tmp/interp" "-a" "-b" "-c" "#dee" "eee" "/tmp/x2"

     FreeBSD 4.0:
          args: "/tmp/interp" "-a" "-b" "-c" "/tmp/x2"

     Linux 2.4.12:
          args: "/tmp/interp" "-a -b -c #dee eee" "/tmp/x2"

     Linux 2.2.19:
          args: "interp" "-a -b -c #dee eee" "/tmp/x2"

     Irix 6.5:
          args: "/tmp/interp" "-a -b -c #dee eee" "/tmp/x2"

     HPUX 11.00:
          args: "/tmp/x2" "-a -b -c #dee eee" "/tmp/x2"

     AIX 4.3:
          args: "interp" "-a -b -c #dee eee" "/tmp/x2"

     Mac OX X:
          args: "interp" "-a -b -c #dee eee" "/tmp/x2"

     The most common behavior is:
          argv[0]: full path of interpreter
          argv[1]: all remaining args, coalesced into one string
          argv[2]: The file file exec'd.

The change committed back in 2000 made the comment: "This complies
to POSIX 1003.2, in that Posix says the implementation is free to
choose whatever it likes.".  I actually like the idea that FreeBSD
splits up the arguments from the shebang-line, but that leaves us
with the problem of figuring out shebang-options from user-specified
options given on the command-line.

As I see it, we have the following choices to fix this:

1) MFC the January 31st change to kern/imgact_shell.c to 5.3-stable,
    as it is.  This means we haven't fixed the problem that people
    complained about in 2002 and again in 2004.  And I still think
    it is "not appropriate" for the execve() system to be deciding
    what '#' means on that line.  The biggest advantage is that this
    means 5.4-release will behave exactly the same as 3.5 through
    5.3-release have behaved.

2) Remove '#'-processing from kern/imgact_shell.c, and remove my
    change to bin/sh/options.c (which doesn't work right once we
    do that).  This breaks shell-scripts which use the feature as
    documented by perl (and other scripting languages), and fixes
    the problem people complained about in 2002/2004.

3) Change kern/imgact_shell.c to process shebang options the same
    way other (non-BSD?) operating systems do.  By that I mean:
    send the entire string as arg[1], and let the scripting
    language sort it out.  This is an incompatible change from
    FreeBSD 5.3 to 5.4, but would put make us "more consistent"
    with other operating systems.

4) Provide some way for /bin/sh to find out where the shebang
    options end, and the user-specified options begin.  This could
    make everyone happy, but it's more work and right now (this
    close to 5.4-release) that wouldn't make me particularly happy...

Or we could do #1 for now, and plan to do #4 after 5.4-release.
Or do #1 now in 5.3, and go with some incompatible change (#2
or #3) only in 6.x-current.

What do people think?  I know this is a mind-numbingly trivial
issue to care about, but I figured that if I just went ahead
with any particular solution, someone would be irritated with me
and assume I must not have understood "the issues".  They will
then commit yet *another* change which undoes whatever I did,
while they fix something they feel that I broke.

And if nothing else, this is proof that one can't just blindly
MFC some change, no matter now trivial it seems.

Garance Alistair Drosehn            =   gad at gilead.netel.rpi.edu
Senior Systems Programmer           or  gad at freebsd.org
Rensselaer Polytechnic Institute    or  drosih at rpi.edu

More information about the freebsd-arch mailing list