Where is FreeBSD going?

Mon Jan 12 13:44:31 PST 2004

On Mon, 12 Jan 2004, Matthew Dillon wrote:

> 
> :> Agreed. Like I've said, the main problem I see is complexity. It 
> :> wouldn't matter as much if there were 5-10 people with deep knowledge of 
> :> SMPng, but with 1 or 2 hackers working on it, the chance that everything 
> :> will be ever fixed is quite small.
> :> 
> :IMO, the easiest way to start the SMP work (from a FreeBSD monolithic
> :approach), is to flatten as much of the VFS/VM code as possible into
> :a continuation scheme...  That is something that I could have done 5yrs
> :ago in a few weeks, and then keep the networking system as it is.
> :There would be shims deployed that would still support the sleep/wakeup
> :scheme, so that the non-networking could and the new flat interface could
> :be debugged...  (It is NOT a good idea to bug the networking guys until
> :the new scheme would be debugged.)
> :
> :At that point, there would be a code with explicit context carried around,
> :and no nesting or stack context.  This would have a small benefit of avoiding
> :multiple deeply nested kernel stacks...
> :
> :Given the very flat scheme, each subsystem could be recoded into a
> :message passing or simple continuation scheme (whatever is appropriate.)
> :The system would be naturally able to be reworked -- without the
> :hidden dependencies of the stack.  VFS/VM layering problems then
> :become resolvable.
> :
> :This is NOT a total solution, but should be the beginning of a thinking
> :exercise that seems to lead into the correct direction.  (Don't
> :criticize this based upon the completeness of my prescription, but
> :on what can eventually be developed!!!)
> 
>     I have been trying to figure out how to implement asynch system
>     calls in DFly, which is a very similar problem to the one posed by 
>     the VFS stack.

I know that Matt knows all this but..

The thing about async syscalls is that by definition, the context
needs to be split.. Something goes back to the caller and something
stays behind to compete the operation. The 2nd "something" can be a
saved message, or a full saved context. Dfly would use the first
methond and FreeBSD uses the 2nd method. (KSE threads are based upon
asyncronous system calls). In case 2 you need a way for the program to
cope with the fact that syscalls return without having done what they
were asked to do.. this is what the kse threading library and API
do.

> 
>     I don't think we can use a pure continuation scheme, but I do
>     think the required context can be minimized enough to fit in
>     a structure.  In DFly, the natural structure to hold the 
>     contextual information is the message structure that was used
>     to initiate the operation in the first place.

In order to make the state minimal, you need to know what state is
important to keep in every situation. In FreeBSD there is not
enough knowledge about this so we keep the entire thread state. If the 
requests were encapsulated in messages then that would help, but you
still need to keep other state available.. for example, if you sleep
while doing the 3rd part (out of 4) of a large read
(that the kernel has broken  up due to allocation discontinuities on the
disk for example) then you still need to keep track of that and the
original message probabyl doesn't have the storeage context for that.

> 
>     So, in regards to async system calls, the message structure
>     contains an additional union that lays out contextual storage
>     requirements for each system call.

yes but you have to design your system for that from scratch..
(Dfly is doing it with a retroactive "scratch" :-)

> 
>     For example, the contextual information required to
>     support nanosleep() would primarily be a timeout structure.
>     (This is in fact the only system call that can be run asynch
>     in DFly at the moment... I am using it as an experimental
>     base to try to refine the code requirements to reduce 
>     complexity).
> 
>     The blocking points for both system calls and VFS calls (which
>     are the real problem being solved here) tend to be related to
>     blocking on I/O events, locks, and mutexes.  In DFly I 
>     primarily have to worry about I/O events and locks and not so
>     much about mutexes.  Also, in DFly, We are serializing many major
>     subsystems and segregating high performance structures, such as PCB's,
>     by associating them with a single thread.  This fits very well
>     with the continuation scheme idea because we would prefer to
>     have only a few threads which handle multiple data structures
>     (to make best use of available cpus), and this means that we cannot
>     simply 'block' in such threads whenever we feel like it without
>     screwing up parallelism.

right.. It depends if yuo thin that doing all that work is worth it..
If you are happy to save the running context and return another that
doesn't hold any locks etc. then you can make the existing code work.
But it has costs of course.

> 
> 					-Matt
> 					Matthew Dillon 
> 					<dillon at backplane.com>
> 
> :Oh well -- I cannot think too much about this stuff, or I'll actually
> :get emotionally involved again.  I need to get a 'normal' job, not
> :working at home and need to interact with people instead of CRTs. :-).
> :(I give a sh*t about FreeBSD, and hope that WHATEVER problems that
> :truly exist are fully resolved.)  There is alot of blood sweat and
> :tears in that codebase, and being involved in the project should be
> :done with great respect.
> :
> :John
> 
> 
> _______________________________________________
> freebsd-hackers at freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscribe at freebsd.org"
>