listening sockets as non sockets

Fri Mar 3 01:57:19 UTC 2017

On Fri, Mar 3, 2017 at 6:00 AM, Gleb Smirnoff <glebius at freebsd.org> wrote:
> On Sun, Feb 26, 2017 at 11:37:59PM +0800, Sepherosa Ziehau wrote:
> S> r314268 -> solisten
> S>
> S> 1KB:
> S> Performance (reqs/s)
> S> 77916.71 -> 26240.37
> S> Latency average
> S> 121ms -> 294ms
> ...
> S> So what I have seen is solisten's performance is 1/3 of r314268, and
> S> average latency doubles.
>
> I did similar testing, and my results are the following, for three
> consecutive runs:
>
>         solisten                head (r306199)
> req/s   63k,63k,63k             46k,47k,44k
> latency 213,214,208             232,233,223
>
> So, I don't see latency increase, neither req/s regression. I see
> the opposite.
>
> What is different about my test? First, this is NetflixBSD, both head
> and solisten installation. Head is based on r306199 and solisten
> is based on r314150 and cb79de4fd2912450c4ab808c017ae395fd636bd8 from
> my github.
>
> To my knowledge the parts of the stack that are different in NetflixBSD
> do not touch sonewconn(), accept4() and other parts we are interested at.
> I also didn't notice any drastical changes in head between r306199 and
> r314150. So imho it is fair to attribute the difference to my change.
>
> The hardware is different. It is Supermicro X9SRH-7F/7TF,
> Xeon(R) CPU E5-2697 v2 @ 2.70GHz, 256Gb RAM and Chelsio cxl(4)
> at 40Gbit/s. I got two boxes of this configuration one running head
> and other solisten. The client box runs same CPU and mainboard, but has
> lagg of two cxls, capping it to 80 Gbit/s, which isn't important but,
> what is important providing more parallelism at sending side.
>
> The nginx has multiple listening sockets, but we bombard only one that
> is at AF_INET4 *:80. The nginx is configured to 64 worker processes
> and accept_mutex is on. So, even with 1 socket, seems like I got some
> improvement.
>
> I run your wrk 498d70f6da5a201f109488eeaf31c8ba891dc163, and the command
> used on the sending side is:
>
> ./wrk -c 15000 -t 48 -d 120s --delay --latency --connreqs 1 http://host/file
>
> The difference to your command is only threads count. My box has much
> more cores.

Well, as I mentioned to you, I hooked up two client boxes, each runs
15000 concurrent connections; so 30K concurrent connections total.
BTW, how heavy traffic your client box could generate?  Each of mine
client box could generate 160Kreqs/s for 1req/conn for 1KB web object,
maybe that's the difference?

>
> The file is of size 1657 bytes.
>
> Sephe, can you please get hwpmc dumps with your test on solisten/head? In
> the test that shows that solisten is 3x slower.

Yeah, sure.  I need to restore the freebsd on the server side; just
wiped out the disk to do the same bench on linux.  I will give you the
information you want in one week or two.  Could you give me the exact
command I need to use for extracting the hwpmc stats?

Thanks,
sephe

-- 
Tomorrow Will Never Die