i386 4/4 change
ota at j.email.ne.jp
Mon Apr 9 12:32:30 UTC 2018
What is the current status of this?
Based on SVN history, it doesn't look https://reviews.freebsd.org/D14633 has been merged/commited yet.
I can try after I recover from disk crahes.
I expect I need few more days to restore.
Will this retire PAE option?
On Sun, 1 Apr 2018 17:05:03 +1000 (EST)
Bruce Evans <brde at optusnet.com.au> wrote:
> On Sun, 1 Apr 2018, Dimitry Andric wrote:
> > On 31 Mar 2018, at 17:57, Bruce Evans <brde at optusnet.com.au> wrote:
> >> On Sat, 31 Mar 2018, Konstantin Belousov wrote:
> >>> the change to provide full 4G of address space for both kernel and
> >>> user on i386 is ready to land. The motivation for the work was to both
> >>> mitigate Meltdown on i386, and to give more breazing space for still
> >>> used 32bit architecture. The patch was tested by Peter Holm, and I am
> >>> satisfied with the code.
> >>> If you use i386 with HEAD, I recommend you to apply the patch from
> >>> https://reviews.freebsd.org/D14633
> >>> and report any regressions before the commit, not after. Unless
> >>> a significant issue is reported, I plan to commit the change somewhere
> >>> at Wed/Thu next week.
> >>> Also I welcome patch comments and reviews.
> >> It crashes at boot time in getmemsize() unless booted with loader which
> >> I don't want to use.
> > For me, it at least compiles and boots OK, but I'm one of those crazy
> > people who use the default boot loader. ;)
> I found a quick fix and sent it to kib. (2 crashes in vm86 code for memory
> sizing. This is not called if loader is used && the system has smap. Old
> systems don't have smap, so they crash even if loader is used.)
> > I haven't yet run any performance tests, I'll try building world and a
> > few large ports tomorrow. General operation from the command line does
> > not feel "sluggish" in any way, however.
> Further performance tests:
> - reading /dev/zero using tinygrams is 6 times slower
> - read/write of a pipe using tinygrams is 25 times slower. It also gives
> unexpected values in wait statuses on exit, hopefully just because the
> bug is in the test program is exposed by the changed timing (but later
> it also gave SIGBUS errors). This does a context switch or 2 for every
> read/write. It now runs 7 times slower using 2 4.GHz CPUs than in
> FreeBSD-5 using 1 2.0 GHz CPU. The faster CPUs and 2 of them used to
> make it run 4 times faster. It shows another slowdown since FreeBSD-5,
> and much larger slowdowns since FreeBSD-1:
> 1996 FreeBSD on P1 133MHz: 72k/s
> 1997 FreeBSD on P1 133MHz: 44k/s (after dyson's opts for large sizes)
> 1997 Linux on P1 133MHz: 93k/s (simpler is faster for small sizes)
> 1999 FreeBSD on K6 266MHz: 129k/s
> 2018 FBSD-~5 on AthXP 2GHz: 696k/s
> 2018 FreeBSD on i7 2x4GHz: 2900k/s
> 2018 FBSD4+4 on i7 2x4GHz: 113k/s (faster than Linux on a P1 133MHz!!)
> Netblast to localhost has much the same 6 times slowness as reading
> /dev/zero using tinygrams. This is the slowdown for syscalls.
> Tinygrams are hard to avoid for UDP. Even 1500 bytes is a tinygram
> for /dev/zero. Without 4+4, localhost is very slow because it does
> a context switch or 2 for every packet (even with 2 CPUs when there is
> no need to switch). Without 4+4 this used to cost much the same as the
> context switches for the pipe benchmark. Now it costs relatively much
> less since (for netblast to localhost) all of the context switches are
> between kernel threads.
> The pipe benchmark uses select() to avoid busy-waiting. That was good
> for UP. But for SMP with just 2 CPUs, it is better to busy-wait and
> poll in the reader and writer.
> netblast already uses busy-waiting. It used to be a bug that select()
> doesn't work on sockets, at least for UDP, so blasting using busy-waiting
> is the only possible method (timeouts are usually too coarse-grained to
> go as fast as blasting, and if they are fine-grained enough to go fast
> then they are not much better than busy-waiting with time wasted for
> setting up timeouts). SMP makes this a feature. It forces use of busy-
> waiting, which is best if you have a CPU free to run it and this method
> doesn't take to much power.
> Context switches to task queues give similar slowness. This won't be
> affected by 4+4 since task queues are in the kernel. I don't like
> networking in userland since it has large syscall and context switch
> costs. Increasing these by factors of 6 and 25 doesn't help. It
> can only be better by combining i/o in a way that the kernel neglects
> to do or which is imposed by per-packet APIs. Slowdown factors of 6
> or 25 require the combined i/o to be 6 or 25 larger to amortise the costs.
> freebsd-current at freebsd.org mailing list
> To unsubscribe, send any mail to "freebsd-current-unsubscribe at freebsd.org"
More information about the freebsd-amd64