Re: About the behavior of tuning of tcp_reass_maxseg

From: Cheng Cui <cc_at_freebsd.org>
Date: Wed, 04 Oct 2023 22:20:28 UTC
The *ratio* sounds good to me. But making it write tunable seems to be an
overshot, as `kern.ipc.nmbclusters` is already a write tunable and it
already has an event handler. If the *ratio* is write tunable, we may have
to add a separate event handler, similar to nmbclusters_change.

I searched around XNU, NetBSD, OpenBSD and DragonFlyBSD. None of them have
such an admin interface to setup the global limit.

On a second thought, I don't think the "auto-tuning" (actually overwriting)
by nmbclusters_change event back to "nmbclusters / 16" is desirable when
the "net.inet.tcp.reass.maxsegments" tunable value exists. That's why I
think we keep the *ratio* read tunable, and update accordingly like this:

        EVENTHANDLER_REGISTER(nmbclusters_change,
            tcp_reass_zone_change, NULL, EVENTHANDLER_PRI_ANY);
...
/* Initialize *update* TCP reassembly queue */
static void
tcp_reass_zone_change(void *tag)
{

        /* Set the zone limit and read back the effective value. */
        tcp_reass_maxseg = nmbclusters / 16;
        tcp_reass_maxseg = nmbclusters * *ratio*;
<< update here too on nmbclusters_change event
        tcp_reass_maxseg = uma_zone_set_max(tcp_reass_zone,
            tcp_reass_maxseg);
}

Best Regards,
Cheng Cui


On Tue, Oct 3, 2023 at 9:20 PM Zhenlei Huang <zlei@freebsd.org> wrote:

>
>
> On Oct 4, 2023, at 7:21 AM, Cheng Cui <cc@freebsd.org> wrote:
>
> Here are my short answers based on my understanding of the two
> relevant commits:
> commit 12e2e970515339729eccebdc751afb97fa0af6fa
> Author: Andre Oppermann <andre@FreeBSD.org>
> Date:   Tue Feb 24 15:27:41 2004 +0000
>
> commit 4f590175b7f3a4efecdd28ffd3306d30724d32c9
> Author: Paul Saab <ps@FreeBSD.org>
> Date:   Fri Apr 21 09:25:40 2006 +0000
>
>
>> So comes some questions.
>>
>> 1. Is the above behavior expected ?
>>
> Yes
>
>
> Then is it better to convert `net.inet.tcp.reass.maxsegments` into a
> *ratio* `net.inet.tcp.reass.maxsegments_ratio` ?
>
> We can have a default 1/16 (of nmbclusters), and it can be write tunable.
> Also we can limit it to maximum of 15/16 (of nmbclusters)
> so that we can always have free mbufs for connections.
>
> The auto-tuning can still be applied whenever nmbclusters changes.
>
>
> 2. Is it desirable to make `net.inet.tcp.reass.maxsegments` write tunable ?
>>
> No, as it is indirectly tuned by nmbclusters. If it is writable, the
> tcp_reass_zone has to be re-limited, which does not seem to be necessary.
>
> 3. Is it valid to set 0 to `net.inet.tcp.reass.maxsegments` ? IIUC that
>> will effectually disable TCP reassembly.
>>
> Yes, if you mean set 0 to the tunable `net.inet.tcp.reass.maxsegments` in
> /boot/loader.conf. I tested it with 0, and it looks like the system booted
> well. But disabling it is not recommended as it conflicts with the protocol
> principle.
>
> 4. Then can we introduce a constant `-1` to explicitly enable the 'auto
>> tuning' feature ?
>
> I don't think it is necessary.
>
> Best Regards,
> Cheng Cui
>
>
> On Thu, Sep 28, 2023 at 12:19 AM Zhenlei Huang <zlei@freebsd.org> wrote:
>
>> Hi,
>>
>> I'm recently working on SYSCTLs features. While testing the sysctl
>> knob `net.inet.tcp.reass.maxsegments` I got something weird. When
>> I adjust `kern.ipc.nmbclusters`, the `net.inet.tcp.reass.maxsegments`
>> got 'auto tuned' regardless what I have set (the kernel env) in
>> `/boot/loader.conf`.
>>
>> The relevant code:
>>
>> ```
>> static int tcp_reass_maxseg = 0;
>> SYSCTL_INT(_net_inet_tcp_reass, OID_AUTO, maxsegments, CTLFLAG_RDTUN,
>>     &tcp_reass_maxseg, 0,
>>     "Global maximum number of TCP Segments in Reassembly Queue");
>>
>> tcp_reass_global_init(void)
>> {
>>
>>         tcp_reass_maxseg = nmbclusters / 16;
>>         TUNABLE_INT_FETCH("net.inet.tcp.reass.maxsegments",
>>             &tcp_reass_maxseg);
>>         tcp_reass_zone = uma_zcreate("tcpreass", sizeof (struct
>> tseg_qent),
>>             NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0);
>>         /* Set the zone limit and read back the effective value. */
>>         tcp_reass_maxseg = uma_zone_set_max(tcp_reass_zone,
>>             tcp_reass_maxseg);
>> ...
>>
>>         EVENTHANDLER_REGISTER(nmbclusters_change,
>>             tcp_reass_zone_change, NULL, EVENTHANDLER_PRI_ANY);
>>
>> }
>>
>> /* Initialize TCP reassembly queue */
>> static void
>> tcp_reass_zone_change(void *tag)
>> {
>>
>>         /* Set the zone limit and read back the effective value. */
>>         tcp_reass_maxseg = nmbclusters / 16;
>>         tcp_reass_maxseg = uma_zone_set_max(tcp_reass_zone,
>>             tcp_reass_maxseg);
>> }
>> ```
>>
>> The above code shows clearly the current behavior.
>>
>> What I expected is, kernel env has priority to the 'auto tuning', unless
>> the admin requested.
>>
>> So comes some questions.
>>
>> 1. Is the above behavior expected ?
>> 2. Is it desirable to make `net.inet.tcp.reass.maxsegments` write tunable
>> ?
>> 3. Is it valid to set 0 to `net.inet.tcp.reass.maxsegments` ? IIUC that
>> will effectually disable TCP reassembly.
>> 4. Then can we introduce a constant `-1` to explicitly enable the 'auto
>> tuning' feature ?
>>
>> Best regards,
>> Zhenlei
>>
>>
>>
>
>