HEADSUP: ibcs2 and svr4 compat headed for history
Matthew Dillon
dillon at apollo.backplane.com
Sun Jun 27 23:54:14 PDT 2004
Because of a desire to maintain / have / create compatibility with
other operating systems, including remaining compatible with FreeBSD-4
and adding FreeBSD-5 compatibility (possibly), as well as Linux,
and of course other architectures that might be used far less.... I
have for the last year been thinking very carefully about the issue
of the compatibility code we have in the kernel.
The problem that I see is not so much that the compatibility code
exists, but that it exists in the kernel. I believe that the solution
is to move it to userland and thus unburden the kernel from having to
deal with it. In userland it can be maintained (A) more easily,
(B) without the security issues involved with it being in the kernel,
and (C) is far more portable.
I fully intend to undertake this project for DragonFly, especially
because as we move to a messaged syscall interface we need to maintain
compatibility with the non-messaged interface, and I want that to be
a function set that runs in userland. i.e. for DragonFly when someone
calls the 'native' read(), it wouldn't actually be a libc function
but would instead be an intermediate user-level function vector whos
code space is managed by the kernel, almost like a mmap'd library
(or exactly like an mmap()'d library, but with a vector table).
It would be great if we could come up with a joint methodology, because
once such an abstraction is operational all the compatibility code that
falls under it, being userland code, would be highly portable to any
operating system running the abstraction.
I would recommend that instead of ripping this stuff out of FreeBSD-5
willy nilly, leave it in for now and let's spend our energies on the
development of an intermediate compatibility layer, abstraction, and
API.
The actual kernel work required to implement such a layer is not all
that complex -- really all the kernel has to do is take an INT 0xN
and throw it back in userland's face (or even just make the INT 0xN vector
an LDT vector that runs in userland's protection ring and never even
enters the kernel).
In regards to where these functions would reside... well, I was thinking
that we would reserve a chunk of VM either just below the kernel start,
or just above the kernel start which would contain the intermediate layer.
The actual address is almost irrelevant because the entry mechanism is,
of course, the system call entry mechanism being emulated. It would
be pure read-only code, with no writable data other then the stack,
whos purpose is simply to translate system calls into the 'native' form.
Another aspect of this abstraction is that it would be possible to
change the kernel's own native entry interface, argument format, and so
forth, and yet still maintain compatibility with 'older' userland
programs by having an intermediate layer that glues userland program
targeted to version X of the kernel to version Y of the kernel which
is actually running. (This is why DFly needs it). One would also be
able to abstract out optimizations, such as providing non-ring-crossing
timestamp functions that utilize memory mapped I/O or other things...
these types of functions would be placed in the proposed intermediate
(run in user mode) layer.
The intermediate layer would also have a direct access mechanism.
That is, userland programs which are aware of the layer could query
to get a vector base and call through a vector array into the layer
directly. The intermediate layer would then optmiize those calls that
do not require entry into the kernel and pass the rest on to the kernel.
The userland program would not know the difference, which is the whole
point of the exercise.
So, as you can see, there is great potential flexibility in such a
design. So much so, in fact, that the ability to move things like
SysV and IBCS2 out of the kernel become mere side effects of a larger
purpose. It would be a huge advance over the crufty syscall methodology
that all UNIXes today employ.
-Matt
More information about the freebsd-current
mailing list