i386 4/4 change

Mon Apr 9 12:32:30 UTC 2018

What is the current status of this?

Based on SVN history, it doesn't look https://reviews.freebsd.org/D14633 has been merged/commited yet.

I can try after I recover from disk crahes.
I expect I need few more days to restore.

Will this retire PAE option?

Thanks,
Hiro

On Sun, 1 Apr 2018 17:05:03 +1000 (EST)
Bruce Evans <brde at optusnet.com.au> wrote:

> 
> On Sun, 1 Apr 2018, Dimitry Andric wrote:
> 
> > On 31 Mar 2018, at 17:57, Bruce Evans <brde at optusnet.com.au> wrote:
> >>
> >> On Sat, 31 Mar 2018, Konstantin Belousov wrote:
> >>
> >>> the change to provide full 4G of address space for both kernel and
> >>> user on i386 is ready to land.  The motivation for the work was to both
> >>> mitigate Meltdown on i386, and to give more breazing space for still
> >>> used 32bit architecture.  The patch was tested by Peter Holm, and I am
> >>> satisfied with the code.
> >>>
> >>> If you use i386 with HEAD, I recommend you to apply the patch from
> >>> https://reviews.freebsd.org/D14633
> >>> and report any regressions before the commit, not after.  Unless
> >>> a significant issue is reported, I plan to commit the change somewhere
> >>> at Wed/Thu next week.
> >>>
> >>> Also I welcome patch comments and reviews.
> >>
> >> It crashes at boot time in getmemsize() unless booted with loader which
> >> I don't want to use.
> 
> > For me, it at least compiles and boots OK, but I'm one of those crazy
> > people who use the default boot loader. ;)
> 
> I found a quick fix and sent it to kib.  (2 crashes in vm86 code for memory
> sizing.  This is not called if loader is used && the system has smap.  Old
> systems don't have smap, so they crash even if loader is used.)
> 
> > I haven't yet run any performance tests, I'll try building world and a
> > few large ports tomorrow.  General operation from the command line does
> > not feel "sluggish" in any way, however.
> 
> Further performance tests:
> - reading /dev/zero using tinygrams is 6 times slower
> - read/write of a pipe using tinygrams is 25 times slower.  It also gives
>    unexpected values in wait statuses on exit, hopefully just because the
>    bug is in the test program is exposed by the changed timing (but later
>    it also gave SIGBUS errors).  This does a context switch or 2 for every
>    read/write.  It now runs 7 times slower using 2 4.GHz CPUs than in
>    FreeBSD-5 using 1 2.0 GHz CPU.  The faster CPUs and 2 of them used to
>    make it run 4 times faster.  It shows another slowdown since FreeBSD-5,
>    and much larger slowdowns since FreeBSD-1:
> 
>    1996 FreeBSD on P1  133MHz:   72k/s
>    1997 FreeBSD on P1  133MHz:   44k/s (after dyson's opts for large sizes)
>    1997 Linux   on P1  133MHz:   93k/s (simpler is faster for small sizes)
>    1999 FreeBSD on K6  266MHz:  129k/s
>    2018 FBSD-~5 on AthXP 2GHz:  696k/s
>    2018 FreeBSD on i7  2x4GHz: 2900k/s
>    2018 FBSD4+4 on i7  2x4GHz:  113k/s (faster than Linux on a P1 133MHz!!)
> 
> Netblast to localhost has much the same 6 times slowness as reading
> /dev/zero using tinygrams.  This is the slowdown for syscalls.
> Tinygrams are hard to avoid for UDP.  Even 1500 bytes is a tinygram
> for /dev/zero.  Without 4+4, localhost is very slow because it does
> a context switch or 2 for every packet (even with 2 CPUs when there is
> no need to switch).  Without 4+4 this used to cost much the same as the
> context switches for the pipe benchmark.  Now it costs relatively much
> less since (for netblast to localhost) all of the context switches are
> between kernel threads.
> 
> The pipe benchmark uses select() to avoid busy-waiting.  That was good
> for UP.  But for SMP with just 2 CPUs, it is better to busy-wait and
> poll in the reader and writer.
> 
> netblast already uses busy-waiting.  It used to be a bug that select()
> doesn't work on sockets, at least for UDP, so blasting using busy-waiting
> is the only possible method (timeouts are usually too coarse-grained to
> go as fast as blasting, and if they are fine-grained enough to go fast
> then they are not much better than busy-waiting with time wasted for
> setting up timeouts).  SMP makes this a feature.  It forces use of busy-
> waiting, which is best if you have a CPU free to run it and this method
> doesn't take to much power.
> 
> Context switches to task queues give similar slowness.  This won't be
> affected by 4+4 since task queues are in the kernel.  I don't like
> networking in userland since it has large syscall and context switch
> costs.  Increasing these by factors of 6 and 25 doesn't help.  It
> can only be better by combining i/o in a way that the kernel neglects
> to do or which is imposed by per-packet APIs.  Slowdown factors of 6
> or 25 require the combined i/o to be 6 or 25 larger to amortise the costs.
> 
> Bruce
> _______________________________________________
> freebsd-current at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"