From nobody Fri Feb 02 23:13:12 2024 X-Original-To: freebsd-net@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4TRWmB4QGPz58V4m; Fri, 2 Feb 2024 23:13:30 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "smtp.gmail.com", Issuer "GTS CA 1D4" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4TRWmB1nddz47Hk; Fri, 2 Feb 2024 23:13:30 +0000 (UTC) (envelope-from rick.macklem@gmail.com) Authentication-Results: mx1.freebsd.org; none Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-6ddcfbc5a5fso2161627b3a.2; Fri, 02 Feb 2024 15:13:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706915609; x=1707520409; darn=freebsd.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=DT8SAv3OA8HWqedgKXY2vWlbzyY8gnSoQtj351cRYIM=; b=fozHQcAdx71mQq8xAZGecXhQWKHOuXGbQDbLSnYBSAlFwb279+6q+3/oqjSp2jmH8I RGcGBih2B/bj6SHfcA64UeQlSW9iSBkEOglO28XGsLC39v96osAbMgQPHIS5NdSSafs/ bv5ysHaCBN0bWRLZui8emmt/BH1MaJu55kQC5yA5XcgTZAy+wwQ7cmE630+PSiGBETtP e+FXn+GEtpB+J99TjeBFOovcWkGKlpa8HQfDnDjrWkoUJalWt5eT3xOI7waXpSNGG0// 7k3RQ5f+z/ZcKT18QDDDsQlXABHJbFXdVWSiN3e6GDI8yT4965k76XkfZqSSdX5O/hAy B5Fw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706915609; x=1707520409; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=DT8SAv3OA8HWqedgKXY2vWlbzyY8gnSoQtj351cRYIM=; b=I/+ZxDis5v8uzX1BJYO9iYivsXUp75/oVhS4P3MwdhLvxeXwQRWAtpLBcdBXt4+hrO fXaccYb0yQYRF8IzeHIs2zZFO0JHthKrTzwZPM1GmELE+AYDnZi2qCI/GhovWPZdq8Jq yGdzz7q+L5wHNu3k8U4JRV8Qo1jW7rLBvcUTADiVA8469Cpz5F+dOMyxL8NsmDwETHxR s/+TjIGJozQB8JYzscchaK3vMM40bLrULnZuwR7JwbhrY1g9wxwHZQbVtzVZrSyAgt0L VS9noBt62HmPEm6paOzYrs6rhhOh0mfB0lKZo48nH3XEi2C+FI8Pkvsdli0yj6dXDGXq e0/Q== X-Gm-Message-State: AOJu0YyBpZl7YJSLhyhJ6+t5z2FXtGsRtRP79GZFI5YaIz9FAbgojRNP ynrQUCL0RyI9c+bm1pAd8aRm6+NF+Ld5I35MHn0LljKDSPwdfPmpRkZflDgH0eflx2VSi9e32Cv ldac2wtipEwm7vw4iHObW/P3CbX5+O6PoGg== X-Google-Smtp-Source: AGHT+IGefsytDfxEzs1sh45HxZ75tQa/nSAXqdwTkI/Q+8i0uM47YdV3moy68htKVDBNtqDp0f2i6WZ2ZLSgZ946JvU= X-Received: by 2002:a05:6a00:2295:b0:6db:ca49:9ce3 with SMTP id f21-20020a056a00229500b006dbca499ce3mr8775204pfe.6.1706915608868; Fri, 02 Feb 2024 15:13:28 -0800 (PST) List-Id: Networking and TCP/IP with FreeBSD List-Archive: https://lists.freebsd.org/archives/freebsd-net List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-net@freebsd.org MIME-Version: 1.0 References: <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org> In-Reply-To: <2c31ac44-b34b-469c-a6de-fdd927ec2f9e@freebsd.org> From: Rick Macklem Date: Fri, 2 Feb 2024 15:13:12 -0800 Message-ID: Subject: Re: Increasing TCP TSO size support To: "Scheffenegger, Richard" Cc: "freebsd-net@FreeBSD.org" , FreeBSD Transport , rmacklem@freebsd.org, gallatin@freebsd.org, kp@freebsd.org Content-Type: multipart/alternative; boundary="000000000000873cc506106e420e" X-Rspamd-Queue-Id: 4TRWmB1nddz47Hk X-Spamd-Bar: ---- X-Rspamd-Pre-Result: action=no action; module=replies; Message is reply to one we originated X-Spamd-Result: default: False [-4.00 / 15.00]; REPLY(-4.00)[]; TAGGED_FROM(0.00)[]; ASN(0.00)[asn:15169, ipnet:2607:f8b0::/32, country:US] --000000000000873cc506106e420e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Feb 2, 2024 at 1:21=E2=80=AFAM Scheffenegger, Richard wrote: > > Hi, > > We have run a test for a RPC workload with 1MB IO sizes, and collected th= e > tcp_default_output() len(gth) during the first pass in the output loop. > > In such a scenario, where the application frequently introduces small > pauses (since the next large IO is only sent after the corresponding > request from the client has been received and processed) between sending > additional data, the current TSO limit of 64kB TSO maximum (45*1448 in > effect) requires multiple passes in the output routine to send all the > allowable (cwnd limited) data. > > I'll try to get a data collection with better granulariy above 90 000 > bytes - but even here the average strongly indicates that a majority of > transmission opportunities are in the 512 kB area - probably also having = to > do with LRO and ACK thinning effects by the client. > > With other words, the tcp output has to run about 9 times with TSO, to > transmit all elegible data - increasing the FreeBSD supported maximum TSO > size to what current hardware could handle (256kB..1MB) would reduce the > CPU burden here. > > > Is increasing the sofware supported TSO size to allow for what the NICs > could nowadays do something anyone apart from us would be interested in (= in > particular, those who work with the drivers)? > Reposted after joining freebsd-net@... A factor here is the if_hw_tsomaxsegcount limit. For example, a 1Mbyte NFS write request or read reply will result in a 514 element mbuf chain. Each of these (mostly 2K mbuf clusters) are non-contiguous data segments. (I suspect most NICs do not handle this many segments well, if at all.) The NFS code does know how to use M_EXTPG mbufs (for NFS over TLS, for the ktls), but I do not know what it would take to make these work for non-KTLS TSO? I do not know how the TSO loop in tcp_output handles M_EXTPG mbufs. Does it assume each M_EXTPG mbuf is one contiguous data segment? I do see that ip_output() will call mb_unmapped_to_ext() when the NIC does not have IFCAP_MEXTPG set. (If IFCAP_MEXTPG is set, do the pages need to be contiguous so that it can become a single contiguous data segment for TSO or ???) If TSO and the code beneath it (NIC and maybe mb_unmapped_to_ext() being called) were to all work ok for M_EXTPG mbufs, it would be easy to enable that for NFS (non-TLS case). I do not want to hijack this thread, but do others know how TSO interacts with M_EXTPG mbufs? rick > Best regards, > > Richard > > > > > tso size (transmissions < 1448 would not be accounted here at all) > > # count > > <1000 0 > <2000 23 > <3000 111 > <4000 40 > <5000 30 > <7000 14 > <8000 134 > <9000 442 > <10000 9396 > <20000 46227 > <30000 25646 > <40000 33060 > <60000 23162 > <70000 24368 > <80000 19772 > <90000 40101 > >=3D90000 75384169 > Average: 578844.44 > > CAUTION: This email originated from outside of the University of Guelph. > Do not click links or open attachments unless you recognize the sender an= d > know the content is safe. If in doubt, forward suspicious emails to > IThelp@uoguelph.ca. > > --000000000000873cc506106e420e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Fri, Feb 2, 2024 at 1:21= =E2=80=AFAM Scheffenegger, Richard <rscheff@freebsd.org> wrote:
=20 =20 =20


Hi,

We have run a test for a RPC workload with 1MB IO sizes, and collected the tcp_default_output() len(gth) during the first pass in the output loop.

In such a scenario, where the application frequently introduces small pauses (since the next large IO is only sent after the corresponding request from the client has been received and processed) between sending additional data, the current TSO limit of 64kB TSO maximum (45*1448 in effect) requires multiple passes in the output routine to send all the allowable (cwnd limited) data.

I'll try to get a data collection with better granulariy above 9= 0 000 bytes - but even here the average strongly indicates that a majority of transmission opportunities are in the 512 kB area - probably also having to do with LRO and ACK thinning effects by the client.

With other words, the tcp output has to run about 9 times with TSO, to transmit all elegible data - increasing the FreeBSD supported maximum TSO size to what current hardware could handle (256kB..1MB) would reduce the CPU burden here.


Is increasing the sofware supported TSO size to allow for what the NICs could nowadays do something anyone apart from us would be interested in (in particular, those who work with the drivers)?

Reposted after joining freebsd-net@...
=C2=A0
=C2=A0A factor here is the if_= hw_tsomaxsegcount limit. For example, a 1Mbyte NFS write request
or read repl= y will result in a 514 element mbuf chain. Each of these (mostly 2K mbuf cl= usters)
a= re non-contiguous data segments. (I suspect most NICs do not handle this ma= ny segments well,
if at all.)

The NFS code does know how to use M_EXTPG mbufs (for NFS over TLS, = for the ktls), but I do not
know what it would take to make these work for non-KTLS = TSO?
I do= not know how the TSO loop in tcp_output handles M_EXTPG mbufs.
Does it assume each = M_EXTPG mbuf is one contiguous data segment?
I do see that ip_output() will call mb_= unmapped_to_ext() when the NIC does not have IFCAP_MEXTPG set.
(If IFCAP_MEXTPG is s= et, do the pages need to be contiguous so that it can become
a single contiguous dat= a segment for TSO or ???)

If TSO and the code beneath it (NIC and maybe mb_unmapped_to_e= xt() being called) were to
all work ok for M_EXTPG mbufs, it would be easy to enable= that for NFS (non-TLS case).

I do not want to hijack this thread, but do others know ho= w TSO interacts with M_EXTPG
mbufs?

rick


Best regards,

=C2=A0 Richard




tso size (transmissions < 1448 would not be accounted here at all)

=C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0 =C2=A0=C2= =A0=C2=A0 =C2=A0=C2=A0=C2=A0 # count

<1000 0
<2000 23
<3000 111
<4000 40
<5000 30
<7000 14
<8000 134
<9000 442
<10000 9396
<20000 46227
<30000 25646
<40000 33060
<60000 23162
<70000 24368
<80000 19772
<90000 40101
>=3D90000 75384169
Average: 578844.44

CAUTION: This email originated from o= utside of the University of Guelph. Do not click links or open attachments = unless you recognize the sender and know the content is safe. If in doubt, = forward suspicious emails to IThelp@uoguelph.ca.

--000000000000873cc506106e420e--