cvs commit: src/sys/amd64/amd64 cpu_switch.S machdep.c

Thu Oct 20 01:27:17 PDT 2005

In message <20051020155911.C99720 at delplex.bde.org>, Bruce Evans writes:

>> One of the things you have to realize is that once you go down this
>> road you need a lot of code for all the conditionals.
>>
>> For instance you need to make sure that every new timestamp you
>> hand out not prior to another one, no matter what is happening to
>> the clocks.
>
>Clocks are already incoherent in many ways:
>- the times returned by the get*() functions incoherent with the ones
>   returned by the functions that read the hardware, because the latter
>   are always in advance of the former and the difference is sometimes
>   visible at the active resolution.

Sorry Bruce, but this is just FUD:  The entire point of the get*
familiy of functions is to provide "good enough" timestamps, very
fast, for code that knows it doesn't need better than roughly 1/hz
precision.

>   visible at the active resolution.  POSIX tests of file times have
>   been reporting this incoherency since timecounters were implemented.
>   The tests use time() to determine the current time and stat() to
>   determine file times.  In the sequence:
>
>         t1 = time(...):
>         sleep(1)
>         touch(file);
>         stat(file);
>         t2 = mtime(file);
>
>   t2 should be < t1, but the bug lets t2 == t1 happen.

t2 == t1 is not illegal.

The morons who defined a non-extensible timestamp format obviously
didn't belive in Andy Moore, but given a sufficiently fast computer
the resolution of the standardized timestamps prevents t2 > t1 in
the above test code.

>- times are incoherent between threads unless the threads use their
>   own expensive locking to prevent this.  This is not very different
>   from timestamps being incoherent between CPUs unless the system uses
>   expensive locking to prevent it.

Only if the get* family of functions is used in places where they
shouldn't be.  I belive there is a sysctl which determines if it
is used for vfs timestamp.  The default can be changed if necessary.

>> So, instead of looking for "quick fixes", lets look at this with a
>> designers or architects view:
>>
>> On a busy system the scheduler works hundred thousand times per
>> second, but on most systems nobody ever looks at the times(2) data.
>
>More like 1000 times a second.  Even stathz = 128 gives too many decisions
>per second for the 4BSD scheduler, so it is divided down to 16 per second.
>Processes blocking on i/o may cause many more than 128/sec calls to the
>scheduler, but there should be nothing much to decide then.

I'm regularly running into 5 digits in the Csw field in systat -vm.
I don't know what events you talk about, but they are clearly not
the same as the ones I'm talking about.

The problem here is context-switch time, and while we can argue if
this is really scheduler related or not, the fact that the scheduler
decides which thread to context-switch to should be enough to
avoid a silly discussion of semantics.

>So the current pessimizations from timecounter calls in mi_switch()
>are an end result of general pessimizations of swtch() starting in
>4.4BSD.  I rather like this part of the pessimizations...

It's so nice to have you back in action Bruce :-)

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk at FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.