Re: Request for Testing: TCP RACK

From: Nuno Teixeira <eduardo_at_freebsd.org>
Date: Wed, 10 Apr 2024 12:39:17 UTC
With base stack I can complete restic check successfully
downloading/reading/checking all files from a "big" remote compressed
backup.
Changing it to RACK stack, it fails.

I run this command often because in the past, compression corruption
occured and this is the equivalent of restoring backup to check its
integrity.

Maybe someone could do a restic test to check if this is reproducible.

Thanks,



<tuexen@freebsd.org> escreveu (quarta, 10/04/2024 à(s) 13:12):

>
>
> > On 10. Apr 2024, at 13:40, Nuno Teixeira <eduardo@freebsd.org> wrote:
> >
> > Hello all,
> >
> > @ current 1500018 and fetching torrents with net-p2p/qbittorrent
> finished ~2GB download and connection UP until the end:
> >
> > ---
> > Apr 10 11:26:46 leg kernel: re0: watchdog timeout
> > Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN
> > Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67
> > Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.255.0
> > Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0):
> 192.168.1.255
> > Apr 10 11:26:49 leg kernel: re0: link state changed to UP
> > Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1
> > ---
> >
> > In the past tests, I've got more watchdog timeouts, connection goes down
> and a reboot needed to put it back (`service netif restart` didn't work).
> >
> > Other way to reproduce this is using sysutils/restic (backup program) to
> read/check all files from a remote server via sftp:
> >
> > `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB
> compressed backup.
> >
> > ---
> > watchdog timeout x3 as above
> > ---
> >
> > restic check fail log @ 15% progress:
> > ---
> > <snip>
> > Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after
> 1.7670599s: connection lost
> > Load(<data/d27a0abe0f>, 17456892, 0) returned error, retrying after
> 4.619104908s: connection lost
> > Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after
> 5.477648517s: connection lost
> > List(lock) returned error, retrying after 293.057766ms: connection lost
> > List(lock) returned error, retrying after 385.206693ms: connection lost
> > List(lock) returned error, retrying after 1.577594281s: connection lost
> > <snip>
> >
> > Connection continues UP.
> Hi,
>
> I'm not sure what the issue is you are reporting. Could you state
> what behavior you are experiencing with the base stack and with
> the RACK stack. In particular, what the difference is?
>
> Best regards
> Michael
> >
> > Cheers,
> >
> > <tuexen@freebsd.org> escreveu (quinta, 28/03/2024 à(s) 15:53):
> >> On 28. Mar 2024, at 15:00, Nuno Teixeira <eduardo@freebsd.org> wrote:
> >>
> >> Hello all!
> >>
> >> Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop
> (amd64)!
> >>
> >> Thanks all!
> > Thanks for the feedback!
> >
> > Best regards
> > Michael
> >>
> >> Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/03/2024 à(s)
> 12:58):
> >> The entire point is to *NOT* go through the overhead of scheduling
> something asynchronously, but to take advantage of the fact that a
> user/kernel transition is going to trash the cache anyway.
> >>
> >> In the common case of a system which has less than the threshold
> number of connections , we access the tcp_hpts_softclock function pointer,
> make one function call, and access hpts_that_need_softclock, and then
> return.  So that's 2 variables and a function call.
> >>
> >> I think it would be preferable to avoid that call, and to move the
> declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they
> are in the same cacheline.  Then we'd be hitting just a single line in the
> common case.  (I've made comments on the review to that effect).
> >>
> >> Also, I wonder if the threshold could get higher by default, so that
> hpts is never called in this context unless we're to the point where we're
> scheduling thousands of runs of the hpts thread (and taking all those clock
> interrupts).
> >>
> >> Drew
> >>
> >> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote:
> >>> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote:
> >>>> Ok I have created
> >>>>
> >>>> https://reviews.freebsd.org/D44420
> >>>>
> >>>>
> >>>> To address the issue. I also attach a short version of the patch that
> Nuno
> >>>> can try and validate
> >>>>
> >>>> it works. Drew you may want to try this and validate the optimization
> does
> >>>> kick in since I can
> >>>>
> >>>> only now test that it does not on my local box :)
> >>> The patch still causes access to all cpu's cachelines on each userret.
> >>> It would be much better to inc/check the threshold and only schedule
> the
> >>> call when exceeded.  Then the call can occur in some dedicated context,
> >>> like per-CPU thread, instead of userret.
> >>>
> >>>>
> >>>>
> >>>> R
> >>>>
> >>>>
> >>>>
> >>>> On 3/18/24 3:42 PM, Drew Gallatin wrote:
> >>>>> No.  The goal is to run on every return to userspace for every
> thread.
> >>>>>
> >>>>> Drew
> >>>>>
> >>>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote:
> >>>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote:
> >>>>>>> I got the idea from
> >>>>>>>
> https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf
> >>>>>>> The gist is that the TCP pacing stuff needs to run frequently, and
> >>>>>>> rather than run it out of a clock interrupt, its more efficient to
> run
> >>>>>>> it out of a system call context at just the point where we return
> to
> >>>>>>> userspace and the cache is trashed anyway. The current
> implementation
> >>>>>>> is fine for our workload, but probably not idea for a generic
> system.
> >>>>>>> Especially one where something is banging on system calls.
> >>>>>>>
> >>>>>>> Ast's could be the right tool for this, but I'm super unfamiliar
> with
> >>>>>>> them, and I can't find any docs on them.
> >>>>>>>
> >>>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent
> to
> >>>>>>> what's happening here?
> >>>>>> This call would need some AST number added, and then it registers
> the
> >>>>>> ast to run on next return to userspace, for the current thread.
> >>>>>>
> >>>>>> Is it enough?
> >>>>>>>
> >>>>>>> Drew
> >>>>>>
> >>>>>>>
> >>>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote:
> >>>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote:
> >>>>>>>>> On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote:
> >>>>>>>>>
> >>>>>>>>>>> On 18. Mar 2024, at 12:42, Nuno Teixeira
> >>>>>> <eduardo@freebsd.org> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hello all!
> >>>>>>>>>>>
> >>>>>>>>>>> It works just fine!
> >>>>>>>>>>> System performance is OK.
> >>>>>>>>>>> Using patch on main-n268841-b0aaf8beb126(-dirty).
> >>>>>>>>>>>
> >>>>>>>>>>> ---
> >>>>>>>>>>> net.inet.tcp.functions_available:
> >>>>>>>>>>> Stack                           D
> >>>>>> Alias                            PCB count
> >>>>>>>>>>> freebsd freebsd                          0
> >>>>>>>>>>> rack                            *
> >>>>>> rack                             38
> >>>>>>>>>>> ---
> >>>>>>>>>>>
> >>>>>>>>>>> It would be so nice that we can have a sysctl tunnable for
> >>>>>> this patch
> >>>>>>>>>>> so we could do more tests without recompiling kernel.
> >>>>>>>>>> Thanks for testing!
> >>>>>>>>>>
> >>>>>>>>>> @gallatin: can you come up with a patch that is acceptable
> >>>>>> for Netflix
> >>>>>>>>>> and allows to mitigate the performance regression.
> >>>>>>>>>
> >>>>>>>>> Ideally, tcphpts could enable this automatically when it
> >>>>>> starts to be
> >>>>>>>>> used (enough?), but a sysctl could select auto/on/off.
> >>>>>>>> There is already a well-known mechanism to request execution of
> the
> >>>>>>>> specific function on return to userspace, namely AST.  The
> difference
> >>>>>>>> with the current hack is that the execution is requested for one
> >>>>>> callback
> >>>>>>>> in the context of the specific thread.
> >>>>>>>>
> >>>>>>>> Still, it might be worth a try to use it; what is the reason to
> >>>>>> hit a thread
> >>>>>>>> that does not do networking, with TCP processing?
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Mike
> >>>>>>>>>
> >>>>>>>>>> Best regards
> >>>>>>>>>> Michael
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks all!
> >>>>>>>>>>> Really happy here :)
> >>>>>>>>>>>
> >>>>>>>>>>> Cheers,
> >>>>>>>>>>>
> >>>>>>>>>>> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo,
> >>>>>> 17/03/2024 à(s) 20:26):
> >>>>>>>>>>>>
> >>>>>>>>>>>> Hello,
> >>>>>>>>>>>>
> >>>>>>>>>>>>> I don't have the full context, but it seems like the
> >>>>>> complaint is a performance regression in bonnie++ and perhaps other
> >>>>>> things when tcp_hpts is loaded, even when it is not used.  Is that
> >>>>>> correct?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If so, I suspect its because we drive the
> >>>>>> tcp_hpts_softclock() routine from userret(), in order to avoid tons
> >>>>>> of timer interrupts and context switches.  To test this theory,  you
> >>>>>> could apply a patch like:
> >>>>>>>>>>>>
> >>>>>>>>>>>> It's affecting overall system performance, bonnie was just
> >>>>>> a way to
> >>>>>>>>>>>> get some numbers to compare.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Tomorrow I will test patch.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks!
> >>>>>>>>>>>>
> >>>>>>>>>>>> --
> >>>>>>>>>>>> Nuno Teixeira
> >>>>>>>>>>>> FreeBSD Committer (ports)
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> --
> >>>>>>>>>>> Nuno Teixeira
> >>>>>>>>>>> FreeBSD Committer (ports)
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c
> >>>> index 8c4d2d41a3eb..eadbee19f69c 100644
> >>>> --- a/sys/netinet/tcp_hpts.c
> >>>> +++ b/sys/netinet/tcp_hpts.c
> >>>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry {
> >>>> void *ie_cookie;
> >>>> uint16_t p_num; /* The hpts number one per cpu */
> >>>> uint16_t p_cpu; /* The hpts CPU */
> >>>> + uint8_t hit_callout_thresh;
> >>>> /* There is extra space in here */
> >>>> /* Cache line 0x100 */
> >>>> struct callout co __aligned(CACHE_LINE_SIZE);
> >>>> @@ -269,6 +270,11 @@ static struct hpts_domain_info {
> >>>> int cpu[MAXCPU];
> >>>> } hpts_domains[MAXMEMDOM];
> >>>>
> >>>> +counter_u64_t hpts_that_need_softclock;
> >>>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO,
> needsoftclock, CTLFLAG_RD,
> >>>> +    &hpts_that_need_softclock,
> >>>> +    "Number of hpts threads that need softclock");
> >>>> +
> >>>> counter_u64_t hpts_hopelessly_behind;
> >>>>
> >>>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless,
> CTLFLAG_RD,
> >>>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO,
> precision, CTLFLAG_RW,
> >>>>    &tcp_hpts_precision, 120,
> >>>>    "Value for PRE() precision of callout");
> >>>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW,
> >>>> -    &conn_cnt_thresh, 0,
> >>>> +    &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD,
> >>>>    "How many connections (below) make us use the callout based
> mechanism");
> >>>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW,
> >>>>    &hpts_does_tp_logging, 0,
> >>>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void)
> >>>> struct tcp_hpts_entry *hpts;
> >>>> int ticks_ran;
> >>>>
> >>>> + if (counter_u64_fetch(hpts_that_need_softclock) == 0)
> >>>> + return;
> >>>> +
> >>>> hpts = tcp_choose_hpts_to_run();
> >>>>
> >>>> if (hpts->p_hpts_active) {
> >>>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx)
> >>>> ticks_ran = tcp_hptsi(hpts, 1);
> >>>> tv.tv_sec = 0;
> >>>> tv.tv_usec = hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT;
> >>>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) &&
> (hpts->hit_callout_thresh == 0)) {
> >>>> + hpts->hit_callout_thresh = 1;
> >>>> + counter_u64_add(hpts_that_need_softclock, 1);
> >>>> + } else if ((hpts->p_on_queue_cnt <= conn_cnt_thresh) &&
> (hpts->hit_callout_thresh == 1)) {
> >>>> + hpts->hit_callout_thresh = 0;
> >>>> + counter_u64_add(hpts_that_need_softclock, -1);
> >>>> + }
> >>>> if (hpts->p_on_queue_cnt >= conn_cnt_thresh) {
> >>>> if(hpts->p_direct_wake == 0) {
> >>>> /*
> >>>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void)
> >>>> cpu_top = NULL;
> >>>> #endif
> >>>> tcp_pace.rp_num_hptss = ncpus;
> >>>> + hpts_that_need_softclock = counter_u64_alloc(M_WAITOK);
> >>>> hpts_hopelessly_behind = counter_u64_alloc(M_WAITOK);
> >>>> hpts_loops = counter_u64_alloc(M_WAITOK);
> >>>> back_tosleep = counter_u64_alloc(M_WAITOK);
> >>>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void)
> >>>> free(tcp_pace.grps, M_TCPHPTS);
> >>>> #endif
> >>>>
> >>>> + counter_u64_free(hpts_that_need_softclock);
> >>>> counter_u64_free(hpts_hopelessly_behind);
> >>>> counter_u64_free(hpts_loops);
> >>>> counter_u64_free(back_tosleep);
> >>>
> >>>
> >>
> >>
> >>
> >> --
> >> Nuno Teixeira
> >> FreeBSD Committer (ports)
> >
> >
> >
> > --
> > Nuno Teixeira
> > FreeBSD Committer (ports)
>
>

-- 
Nuno Teixeira
FreeBSD Committer (ports)