Re: Request for Testing: TCP RACK
- Reply: Nuno Teixeira : "Re: Request for Testing: TCP RACK"
- In reply to: tuexen_a_freebsd.org: "Re: Request for Testing: TCP RACK"
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Wed, 10 Apr 2024 12:39:17 UTC
With base stack I can complete restic check successfully downloading/reading/checking all files from a "big" remote compressed backup. Changing it to RACK stack, it fails. I run this command often because in the past, compression corruption occured and this is the equivalent of restoring backup to check its integrity. Maybe someone could do a restic test to check if this is reproducible. Thanks, <tuexen@freebsd.org> escreveu (quarta, 10/04/2024 à(s) 13:12): > > > > On 10. Apr 2024, at 13:40, Nuno Teixeira <eduardo@freebsd.org> wrote: > > > > Hello all, > > > > @ current 1500018 and fetching torrents with net-p2p/qbittorrent > finished ~2GB download and connection UP until the end: > > > > --- > > Apr 10 11:26:46 leg kernel: re0: watchdog timeout > > Apr 10 11:26:46 leg kernel: re0: link state changed to DOWN > > Apr 10 11:26:49 leg dhclient[58810]: New IP Address (re0): 192.168.1.67 > > Apr 10 11:26:49 leg dhclient[58814]: New Subnet Mask (re0): 255.255.255.0 > > Apr 10 11:26:49 leg dhclient[58818]: New Broadcast Address (re0): > 192.168.1.255 > > Apr 10 11:26:49 leg kernel: re0: link state changed to UP > > Apr 10 11:26:49 leg dhclient[58822]: New Routers (re0): 192.168.1.1 > > --- > > > > In the past tests, I've got more watchdog timeouts, connection goes down > and a reboot needed to put it back (`service netif restart` didn't work). > > > > Other way to reproduce this is using sysutils/restic (backup program) to > read/check all files from a remote server via sftp: > > > > `restic -r sftp:user@remote:restic-repo check --read-data` from a 60GB > compressed backup. > > > > --- > > watchdog timeout x3 as above > > --- > > > > restic check fail log @ 15% progress: > > --- > > <snip> > > Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after > 1.7670599s: connection lost > > Load(<data/d27a0abe0f>, 17456892, 0) returned error, retrying after > 4.619104908s: connection lost > > Load(<data/52e2923dd6>, 17310001, 0) returned error, retrying after > 5.477648517s: connection lost > > List(lock) returned error, retrying after 293.057766ms: connection lost > > List(lock) returned error, retrying after 385.206693ms: connection lost > > List(lock) returned error, retrying after 1.577594281s: connection lost > > <snip> > > > > Connection continues UP. > Hi, > > I'm not sure what the issue is you are reporting. Could you state > what behavior you are experiencing with the base stack and with > the RACK stack. In particular, what the difference is? > > Best regards > Michael > > > > Cheers, > > > > <tuexen@freebsd.org> escreveu (quinta, 28/03/2024 à(s) 15:53): > >> On 28. Mar 2024, at 15:00, Nuno Teixeira <eduardo@freebsd.org> wrote: > >> > >> Hello all! > >> > >> Running rack @b7b78c1c169 "Optimize HPTS..." very happy on my laptop > (amd64)! > >> > >> Thanks all! > > Thanks for the feedback! > > > > Best regards > > Michael > >> > >> Drew Gallatin <gallatin@freebsd.org> escreveu (quinta, 21/03/2024 à(s) > 12:58): > >> The entire point is to *NOT* go through the overhead of scheduling > something asynchronously, but to take advantage of the fact that a > user/kernel transition is going to trash the cache anyway. > >> > >> In the common case of a system which has less than the threshold > number of connections , we access the tcp_hpts_softclock function pointer, > make one function call, and access hpts_that_need_softclock, and then > return. So that's 2 variables and a function call. > >> > >> I think it would be preferable to avoid that call, and to move the > declaration of tcp_hpts_softclock and hpts_that_need_softclock so that they > are in the same cacheline. Then we'd be hitting just a single line in the > common case. (I've made comments on the review to that effect). > >> > >> Also, I wonder if the threshold could get higher by default, so that > hpts is never called in this context unless we're to the point where we're > scheduling thousands of runs of the hpts thread (and taking all those clock > interrupts). > >> > >> Drew > >> > >> On Wed, Mar 20, 2024, at 8:17 PM, Konstantin Belousov wrote: > >>> On Tue, Mar 19, 2024 at 06:19:52AM -0400, rrs wrote: > >>>> Ok I have created > >>>> > >>>> https://reviews.freebsd.org/D44420 > >>>> > >>>> > >>>> To address the issue. I also attach a short version of the patch that > Nuno > >>>> can try and validate > >>>> > >>>> it works. Drew you may want to try this and validate the optimization > does > >>>> kick in since I can > >>>> > >>>> only now test that it does not on my local box :) > >>> The patch still causes access to all cpu's cachelines on each userret. > >>> It would be much better to inc/check the threshold and only schedule > the > >>> call when exceeded. Then the call can occur in some dedicated context, > >>> like per-CPU thread, instead of userret. > >>> > >>>> > >>>> > >>>> R > >>>> > >>>> > >>>> > >>>> On 3/18/24 3:42 PM, Drew Gallatin wrote: > >>>>> No. The goal is to run on every return to userspace for every > thread. > >>>>> > >>>>> Drew > >>>>> > >>>>> On Mon, Mar 18, 2024, at 3:41 PM, Konstantin Belousov wrote: > >>>>>> On Mon, Mar 18, 2024 at 03:13:11PM -0400, Drew Gallatin wrote: > >>>>>>> I got the idea from > >>>>>>> > https://people.mpi-sws.org/~druschel/publications/soft-timers-tocs.pdf > >>>>>>> The gist is that the TCP pacing stuff needs to run frequently, and > >>>>>>> rather than run it out of a clock interrupt, its more efficient to > run > >>>>>>> it out of a system call context at just the point where we return > to > >>>>>>> userspace and the cache is trashed anyway. The current > implementation > >>>>>>> is fine for our workload, but probably not idea for a generic > system. > >>>>>>> Especially one where something is banging on system calls. > >>>>>>> > >>>>>>> Ast's could be the right tool for this, but I'm super unfamiliar > with > >>>>>>> them, and I can't find any docs on them. > >>>>>>> > >>>>>>> Would ast_register(0, ASTR_UNCOND, 0, func) be roughly equivalent > to > >>>>>>> what's happening here? > >>>>>> This call would need some AST number added, and then it registers > the > >>>>>> ast to run on next return to userspace, for the current thread. > >>>>>> > >>>>>> Is it enough? > >>>>>>> > >>>>>>> Drew > >>>>>> > >>>>>>> > >>>>>>> On Mon, Mar 18, 2024, at 2:33 PM, Konstantin Belousov wrote: > >>>>>>>> On Mon, Mar 18, 2024 at 07:26:10AM -0500, Mike Karels wrote: > >>>>>>>>> On 18 Mar 2024, at 7:04, tuexen@freebsd.org wrote: > >>>>>>>>> > >>>>>>>>>>> On 18. Mar 2024, at 12:42, Nuno Teixeira > >>>>>> <eduardo@freebsd.org> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hello all! > >>>>>>>>>>> > >>>>>>>>>>> It works just fine! > >>>>>>>>>>> System performance is OK. > >>>>>>>>>>> Using patch on main-n268841-b0aaf8beb126(-dirty). > >>>>>>>>>>> > >>>>>>>>>>> --- > >>>>>>>>>>> net.inet.tcp.functions_available: > >>>>>>>>>>> Stack D > >>>>>> Alias PCB count > >>>>>>>>>>> freebsd freebsd 0 > >>>>>>>>>>> rack * > >>>>>> rack 38 > >>>>>>>>>>> --- > >>>>>>>>>>> > >>>>>>>>>>> It would be so nice that we can have a sysctl tunnable for > >>>>>> this patch > >>>>>>>>>>> so we could do more tests without recompiling kernel. > >>>>>>>>>> Thanks for testing! > >>>>>>>>>> > >>>>>>>>>> @gallatin: can you come up with a patch that is acceptable > >>>>>> for Netflix > >>>>>>>>>> and allows to mitigate the performance regression. > >>>>>>>>> > >>>>>>>>> Ideally, tcphpts could enable this automatically when it > >>>>>> starts to be > >>>>>>>>> used (enough?), but a sysctl could select auto/on/off. > >>>>>>>> There is already a well-known mechanism to request execution of > the > >>>>>>>> specific function on return to userspace, namely AST. The > difference > >>>>>>>> with the current hack is that the execution is requested for one > >>>>>> callback > >>>>>>>> in the context of the specific thread. > >>>>>>>> > >>>>>>>> Still, it might be worth a try to use it; what is the reason to > >>>>>> hit a thread > >>>>>>>> that does not do networking, with TCP processing? > >>>>>>>> > >>>>>>>>> > >>>>>>>>> Mike > >>>>>>>>> > >>>>>>>>>> Best regards > >>>>>>>>>> Michael > >>>>>>>>>>> > >>>>>>>>>>> Thanks all! > >>>>>>>>>>> Really happy here :) > >>>>>>>>>>> > >>>>>>>>>>> Cheers, > >>>>>>>>>>> > >>>>>>>>>>> Nuno Teixeira <eduardo@freebsd.org> escreveu (domingo, > >>>>>> 17/03/2024 à(s) 20:26): > >>>>>>>>>>>> > >>>>>>>>>>>> Hello, > >>>>>>>>>>>> > >>>>>>>>>>>>> I don't have the full context, but it seems like the > >>>>>> complaint is a performance regression in bonnie++ and perhaps other > >>>>>> things when tcp_hpts is loaded, even when it is not used. Is that > >>>>>> correct? > >>>>>>>>>>>>> > >>>>>>>>>>>>> If so, I suspect its because we drive the > >>>>>> tcp_hpts_softclock() routine from userret(), in order to avoid tons > >>>>>> of timer interrupts and context switches. To test this theory, you > >>>>>> could apply a patch like: > >>>>>>>>>>>> > >>>>>>>>>>>> It's affecting overall system performance, bonnie was just > >>>>>> a way to > >>>>>>>>>>>> get some numbers to compare. > >>>>>>>>>>>> > >>>>>>>>>>>> Tomorrow I will test patch. > >>>>>>>>>>>> > >>>>>>>>>>>> Thanks! > >>>>>>>>>>>> > >>>>>>>>>>>> -- > >>>>>>>>>>>> Nuno Teixeira > >>>>>>>>>>>> FreeBSD Committer (ports) > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -- > >>>>>>>>>>> Nuno Teixeira > >>>>>>>>>>> FreeBSD Committer (ports) > >>>>>>>>> > >>>>>>>> > >>>>>> > >>>>> > >>> > >>>> diff --git a/sys/netinet/tcp_hpts.c b/sys/netinet/tcp_hpts.c > >>>> index 8c4d2d41a3eb..eadbee19f69c 100644 > >>>> --- a/sys/netinet/tcp_hpts.c > >>>> +++ b/sys/netinet/tcp_hpts.c > >>>> @@ -216,6 +216,7 @@ struct tcp_hpts_entry { > >>>> void *ie_cookie; > >>>> uint16_t p_num; /* The hpts number one per cpu */ > >>>> uint16_t p_cpu; /* The hpts CPU */ > >>>> + uint8_t hit_callout_thresh; > >>>> /* There is extra space in here */ > >>>> /* Cache line 0x100 */ > >>>> struct callout co __aligned(CACHE_LINE_SIZE); > >>>> @@ -269,6 +270,11 @@ static struct hpts_domain_info { > >>>> int cpu[MAXCPU]; > >>>> } hpts_domains[MAXMEMDOM]; > >>>> > >>>> +counter_u64_t hpts_that_need_softclock; > >>>> +SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, > needsoftclock, CTLFLAG_RD, > >>>> + &hpts_that_need_softclock, > >>>> + "Number of hpts threads that need softclock"); > >>>> + > >>>> counter_u64_t hpts_hopelessly_behind; > >>>> > >>>> SYSCTL_COUNTER_U64(_net_inet_tcp_hpts_stats, OID_AUTO, hopeless, > CTLFLAG_RD, > >>>> @@ -334,7 +340,7 @@ SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, > precision, CTLFLAG_RW, > >>>> &tcp_hpts_precision, 120, > >>>> "Value for PRE() precision of callout"); > >>>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, cnt_thresh, CTLFLAG_RW, > >>>> - &conn_cnt_thresh, 0, > >>>> + &conn_cnt_thresh, DEFAULT_CONNECTION_THESHOLD, > >>>> "How many connections (below) make us use the callout based > mechanism"); > >>>> SYSCTL_INT(_net_inet_tcp_hpts, OID_AUTO, logging, CTLFLAG_RW, > >>>> &hpts_does_tp_logging, 0, > >>>> @@ -1548,6 +1554,9 @@ __tcp_run_hpts(void) > >>>> struct tcp_hpts_entry *hpts; > >>>> int ticks_ran; > >>>> > >>>> + if (counter_u64_fetch(hpts_that_need_softclock) == 0) > >>>> + return; > >>>> + > >>>> hpts = tcp_choose_hpts_to_run(); > >>>> > >>>> if (hpts->p_hpts_active) { > >>>> @@ -1683,6 +1692,13 @@ tcp_hpts_thread(void *ctx) > >>>> ticks_ran = tcp_hptsi(hpts, 1); > >>>> tv.tv_sec = 0; > >>>> tv.tv_usec = hpts->p_hpts_sleep_time * HPTS_TICKS_PER_SLOT; > >>>> + if ((hpts->p_on_queue_cnt > conn_cnt_thresh) && > (hpts->hit_callout_thresh == 0)) { > >>>> + hpts->hit_callout_thresh = 1; > >>>> + counter_u64_add(hpts_that_need_softclock, 1); > >>>> + } else if ((hpts->p_on_queue_cnt <= conn_cnt_thresh) && > (hpts->hit_callout_thresh == 1)) { > >>>> + hpts->hit_callout_thresh = 0; > >>>> + counter_u64_add(hpts_that_need_softclock, -1); > >>>> + } > >>>> if (hpts->p_on_queue_cnt >= conn_cnt_thresh) { > >>>> if(hpts->p_direct_wake == 0) { > >>>> /* > >>>> @@ -1818,6 +1834,7 @@ tcp_hpts_mod_load(void) > >>>> cpu_top = NULL; > >>>> #endif > >>>> tcp_pace.rp_num_hptss = ncpus; > >>>> + hpts_that_need_softclock = counter_u64_alloc(M_WAITOK); > >>>> hpts_hopelessly_behind = counter_u64_alloc(M_WAITOK); > >>>> hpts_loops = counter_u64_alloc(M_WAITOK); > >>>> back_tosleep = counter_u64_alloc(M_WAITOK); > >>>> @@ -2042,6 +2059,7 @@ tcp_hpts_mod_unload(void) > >>>> free(tcp_pace.grps, M_TCPHPTS); > >>>> #endif > >>>> > >>>> + counter_u64_free(hpts_that_need_softclock); > >>>> counter_u64_free(hpts_hopelessly_behind); > >>>> counter_u64_free(hpts_loops); > >>>> counter_u64_free(back_tosleep); > >>> > >>> > >> > >> > >> > >> -- > >> Nuno Teixeira > >> FreeBSD Committer (ports) > > > > > > > > -- > > Nuno Teixeira > > FreeBSD Committer (ports) > > -- Nuno Teixeira FreeBSD Committer (ports)