Miroslav Lachman 000.fbsd at quip.cz
Sat Jan 1 13:56:30 UTC 2011

John Baldwin wrote:
> On Saturday, December 25, 2010 6:43:25 am Miroslav Lachman wrote:
>> John Baldwin wrote:
>>> On Saturday, December 11, 2010 11:51:41 am Miroslav Lachman wrote:
>>>> Miroslav Lachman wrote:
>>>>> Garrett Cooper wrote:
>>>>>> 2010/4/20 Miroslav Lachman<000.fbsd at quip.cz>:
>>>>>>> I have large storage partition (/vol0) mounted as noexec and nosuid.
>>>>>>> Then
>>>>>>> one directory from this partition is mounted by nullfs as "exec and
>>>>>>> suid" so
>>>>>>> anything on it can be executed.
>>>>>>> The directory contains full installation of jail. Jail is running
>>>>>>> fine, but
>>>>>>> some ports (PHP for example) cannot be compiled inside the jail with
>>>>>>> message:
>>>>>>> /libexec/ld-elf.so.1: Cannot execute objects on /
>>>>>>> The same apply to executing of apxs
>>>>>>> root at rainnew ~/# /usr/local/sbin/apxs -q MPM_NAME
>>>>>>> /libexec/ld-elf.so.1: Cannot execute objects on /
>>>>>>> apxs:Error: Sorry, no shared object support for Apache.
>>>>>>> apxs:Error: available under your platform. Make sure.
>>>>>>> apxs:Error: the Apache module mod_so is compiled into.
>>>>>>> apxs:Error: your server binary '/usr/local/sbin/httpd'..
>>>>>>> (it should return "prefork")
>>>>>>> So I think there is some bug in checking the mountpoint options,
>>>>>>> where the
>>>>>>> check is made on "parent" of the nullfs instead of the nullfs target
>>>>>>> mountpoint.
>>>>>>> It is on 6.4-RELEASE i386 GENERIC. I did not test it on another release.
>>>>>>> This is list of related mount points:
>>>>>>> /dev/mirror/gm0s2d on /vol0 (ufs, local, noexec, nosuid, soft-updates)
>>>>>>> /vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local)
>>>>>>> /usr/ports on /vol0/jail/rain_new/usr/ports (nullfs, local)
>>>>>>> devfs on /vol0/jail/rain_new/dev (devfs, local)
>>>>>>> If I changed /vol0 options to (ufs, local, soft-updates) the above
>>>>>>> error is
>>>>>>> gone and apxs / compilation works fine.
>>>>>>> Can somebody look at this problem?
>>>>>> Can you please provide output from ktrace / truss for the issue?
>>>>> I did
>>>>> # ktrace /usr/local/sbin/apxs -q MPM_NAME
>>>>> The output is here http://freebsd.quip.cz/ld-elf/ktrace.out
>>>>> Let me know if you need something else.
>>>>> Thank you for your interest!
>>>> The problem is still there in FreeBSD 8.1-RELEASE amd64 GENERIC (and in
>>>> 7.x).
>>>> Can somebody say if this is a bug or an expected "feature"?
>>> I think this is the expected behavior as nullfs is simply re-exposing /vol0
>>> and it shouldn't be able to create a more privileged mount than the underlying
>>> mount I think (e.g. a read/write nullfs mount of a read-only filesystem would
>>> not make the underlying files read/write).  It can be used to provide less
>>> privilege (e.g. a readonly nullfs mount of a read/write filesystem does not
>>> allow writes via the nullfs layer).
>>> That said, I'm not sure exactly where the permission check is failing.
>>> execve() only checks MNT_NOEXEC on the "upper" vnode's mountpoint (i.e. the
>>> nullfs mountpoint) and the VOP_ACCESS(.., V_EXEC) check does not look at
>>> MNT_NOEXEC either.
>>> I do think there might be bugs in that a nullfs mount that specifies noexec or
>>> nosuid might not enforce the noexec or nosuid bits if the underlying mount
>>> point does not have them set (from what I can see).
>> Thank you for your explanation. Then it is strange, that there is bug,
>> that allows execution on originally non executable mountpoint.
>> It should be mentioned in the bugs section of the mount_nullfs man page.
>> It would be useful, if 'mount' output shows inherited options for nullfs.
>> If parent is:
>> /dev/mirror/gm0s2d on /vol0 (ufs, local, noexec, nosuid, soft-updates)
>> Then nullfs line will be:
>> /vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local, noexec,
>> nosuid)
>> instead of just
>> /vol0/jail/.nullfs/rain on /vol0/jail/rain_new (nullfs, local)
>> Then I can understand what is expected behavior, but our current state
>> is half working, if I can execute scripts and binaries, run jail on it,
>> but can't execute "apxs -q MPM_NAME" and few others.
> Hmm, so I was a bit mistaken.  The kernel is not failing to exec the binary.
> Instead, rtld is reporting the error here:
> static Obj_Entry *
> do_load_object(int fd, const char *name, char *path, struct stat *sbp,
>    int flags)
> {
>      Obj_Entry *obj;
>      struct statfs fs;
>      /*
>       * but first, make sure that environment variables haven't been
>       * used to circumvent the noexec flag on a filesystem.
>       */
>      if (dangerous_ld_env) {
>          if (fstatfs(fd,&fs) != 0) {
>              _rtld_error("Cannot fstatfs \"%s\"", path);
>                  return NULL;
>          }
>          if (fs.f_flags&  MNT_NOEXEC) {
>              _rtld_error("Cannot execute objects on %s\n", fs.f_mntonname);
>              return NULL;
>          }
>      }
> I wonder if the fstatfs is falling down to the original mount rather than
> being caught by nullfs.
> Hmm, nullfs' statfs method returns the flags for the underlying mount, not
> the flags for the nullfs mount.  This is possibly broken, but it is the
> behavior nullfs has always had and the behavior it still has on other BSDs.

I am sorry, I am not a programmer, so the code doesn't tell me much.
Does it mean "we must leave it in current state" (for compatibility with 
other BSDs) or can it be fixed in the future?

I can't tell if it will be better to disable all exec operations if 
parental mount is noexec, or to allow all exec operations. I just think 
that current state is broken if something can be executed ant something 

And again, thank you for your time, explanation and interest in this 

Miroslav Lachman

