cvs commit: www/en/projects/ideas index.sgml

Fri Jul 28 08:52:34 UTC 2006

On Fri, 28 Jul 2006, Joel Dahl wrote:

>  Modified files:
>    en/projects/ideas    index.sgml
>  Log:
>  -  Extend the ktrace project with a new task. [1]

I think I might be somewhat cautious with this one -- right now, ktrace has a 
fairly clean and simple design, and trying to turn it into a general purpose 
tracing facility talking to additional IPC classes, etc, risks adding 
significant complexity.  One of the nice simplifying assumptions of ktrace as 
it stands today is that writing directly to disk files limits the opportunity 
for deadlock despite synchronous/reliable logging.  ptrace() and related 
facilities include some basic deadlock prevention code to help avoid debugging 
cycles; IPC types lend themselves less well to this, so if ktrace is going to 
support synchronous (blocking, reliable) IPC, then caution will need to be 
exercised.  The result of moving in the proposed direction may be that we want 
to look at reintroducing asynchrony for ktrace over IPC, which I only just 
eliminated in ktrace following wide-spread complaints of unreliability and 
record loss after asynchrony was added in 5.x.  I think we should add a note 
to this idea that while the idea is simple, the correct and safe 
implementation is quite tricky, and that this is a useful idea to explore but 
not necessarily something we will adopt.

BTW, a problem that has occurred a number of times in the past is that people 
have approached us with implementations of ideas in the idea list that it has 
later transpired we aren't actually interested in (sometimes at all).  I think 
it might not be a bad idea to sprinkle the idea list with some additional 
cautionary language -- often ideas listed there are things to explore, not to 
adopt without very careful consideration.  For example, the "FPU subsystem 
overhaul", "Process checkpointing", "Pluggable disk shceduler", "Magic 
Symlinks", "NFS Lockd (kernel implementation)", and several others -- the task 
here often isn't to port/write the code, the task is to port/write and then 
perform a detailed and careful evaluation of the changes to decide whether 
they are a good idea, and then consider adopting the code only if the 
evaluation suggests it is a good idea and after significant refinment.

Some of the ideas on this list are distinctly "explore this direction as a 
computer scientist, not a code hacker" sorts of problems -- for example, the 
"Process checkpointing" task seems to suggest that if you can read the DFBSD 
repository and write some C code, you're set.  In fact, this is not remotely 
the case.  Checkpointing is a very difficult problem in computer science, with 
little consensus on how it should be done (and indeed whether it should be 
done at all) by general purpose operating systems.  Not only that, but we 
would not adopt the DFBSD implementation as-is, as it solves a few of the easy 
problems, and none of the hard ones (i.e., security).  The requirements here 
aren't just the ability to write code, but an understanding of distributed 
systems, our application/execution model, a strong understanding of the 
performance and security requirements, and willingness to not just look at 
code but the extensive research literature on this topic.

I think people often grab ideas from the list thinking that if implemented as 
described, they will get committed, and this is not the case.  In many of the 
sorts of "scientific" cases it's likely we'll look at the results and say, 
"Oh, that was a bad idea", or maybe slightly more likely, "Oh, hmm, not so 
sure about that".  The existing cautionary language captures that there might 
be disagreements on the specifics, but fails to capture that there may be 
disagreements on the fundamental ideas themselves.  I like the ideas list idea 
a lot, and don't want to see it removed, but I also don't want people getting 
the false impression that this is a "todo" list.  Some items are todo items 
and obvious short-order commit candidates, others are out-there ideas that 
have potential and should be characterized as "high risk" when it comes to the 
results actually being used.  Maybe what we should be thinking about is 
classifying the todo list items into rote items (things where the chances of 
adoption of a decent implementation are high, subject to review) and researchy 
things (where the chances of adoption are low, not just because the chances of 
a good implementation are low, but because there are lots of open and very 
difficult questions involved).  This would help prevent misunderstandings, if 
nothing else.

Robert N M Watson
Computer Laboratory
University of Cambridge