From nobody Fri Jan 20 04:11:38 2023 X-Original-To: freebsd-stable@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4NymT05bQ0z311rv for ; Fri, 20 Jan 2023 04:18:28 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840::12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "uucp.dinoex.sub.de", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4NymSz0YmGz43pY for ; Fri, 20 Jan 2023 04:18:26 +0000 (UTC) (envelope-from pmc@citylink.dinoex.sub.org) Authentication-Results: mx1.freebsd.org; dkim=none; spf=pass (mx1.freebsd.org: domain of pmc@citylink.dinoex.sub.org designates 2a0b:f840::12 as permitted sender) smtp.mailfrom=pmc@citylink.dinoex.sub.org; dmarc=none Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]) by uucp.dinoex.org (8.17.1/8.17.1) with ESMTPS id 30K4I5JJ078955 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Fri, 20 Jan 2023 05:18:06 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-MDaemon-Deliver-To: Received: (from uucp@localhost) by uucp.dinoex.org (8.17.1/8.17.1/Submit) with UUCP id 30K4I55m078954 for freebsd-stable@freebsd.org; Fri, 20 Jan 2023 05:18:05 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (disp-e.intra.daemon.contact [IPv6:fd00:0:0:0:0:0:0:112]) by admn.intra.daemon.contact (8.16.1/8.16.1) with ESMTPS id 30K4DrWQ046392 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK) for ; Fri, 20 Jan 2023 05:13:53 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (localhost [127.0.0.1]) by disp.intra.daemon.contact (8.16.1/8.16.1) with ESMTPS id 30K4BcDf032472 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Fri, 20 Jan 2023 05:11:39 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from pmc@localhost) by disp.intra.daemon.contact (8.16.1/8.16.1/Submit) id 30K4BcWo032471 for freebsd-stable@freebsd.org; Fri, 20 Jan 2023 05:11:38 +0100 (CET) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: disp.intra.daemon.contact: pmc set sender to pmc@citylink.dinoex.sub.org using -f Date: Fri, 20 Jan 2023 05:11:38 +0100 From: Peter To: freebsd-stable@freebsd.org Subject: close() taking minutes Message-ID: List-Id: Production branch of FreeBSD source code List-Archive: https://lists.freebsd.org/archives/freebsd-stable List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-freebsd-stable@freebsd.org X-BeenThere: freebsd-stable@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Milter: Spamilter (Reciever: uucp.dinoex.org; Sender-ip: 0:0:2a0b:f840::; Sender-helo: uucp.dinoex.org;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]); Fri, 20 Jan 2023 05:18:08 +0100 (CET) X-Spamd-Result: default: False [-3.30 / 15.00]; NEURAL_HAM_LONG(-1.00)[-1.000]; NEURAL_HAM_MEDIUM(-1.00)[-1.000]; NEURAL_HAM_SHORT(-1.00)[-0.996]; R_SPF_ALLOW(-0.20)[+mx]; MIME_GOOD(-0.10)[text/plain]; MLMMJ_DEST(0.00)[freebsd-stable@freebsd.org]; DMARC_NA(0.00)[sub.org]; ARC_NA(0.00)[]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; HAS_XAW(0.00)[]; RCPT_COUNT_ONE(0.00)[1]; PREVIOUSLY_DELIVERED(0.00)[freebsd-stable@freebsd.org]; RCVD_COUNT_FIVE(0.00)[5]; ASN(0.00)[asn:205376, ipnet:2a0b:f840::/32, country:DE]; FROM_HAS_DN(0.00)[]; TO_DN_NONE(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; RCVD_TLS_LAST(0.00)[] X-Rspamd-Queue-Id: 4NymSz0YmGz43pY X-Spamd-Bar: --- X-ThisMailContainsUnwantedMimeParts: N Does this look familiar with anybody? ("truss -dD ls -l something") 0.047760369 0.000017548 open("/etc/nsswitch.conf",O_RDONLY|O_CLOEXEC,0666) = 4 (0x4) 0.047845099 0.000011290 fstat(4,{ mode=-rw-r--r-- ,inode=13886,size=272,blksize=4096 }) = 0 (0x0) 0.047932166 0.000034182 read(4,"#\n# nsswitch.conf(5) - name ser"...,4096) = 272 (0x110) 0.048007308 0.000010661 read(4,0x801848000,4096) = 0 (0x0) 45.827173802 45.779056767 close(4) = 0 (0x0) ^^^^^^^^^^^^ ... 45.866272643 0.000020608 open("/etc/localtime",O_RDONLY,077) = 6 (0x6) 45.866398048 0.000017731 fstat(6,{ mode=-r--r--r-- ,inode=41572,size=2309,blksize=4096 }) = 0 (0x0) 45.866505596 0.000054084 read(6,"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0"...,41448) = 2309 (0x905) 49.511709215 3.645026276 close(6) = 0 (0x0) ^^^^^^^^^^^ (The longest I've seen was more than 5 minutes) There is otherwise nothing wrong with the machine, only I am running a postgres table merge for some 10 mio. rows. And only during that the effect happens, reproducibly. The concerned disk is a mirror of two consumer SATA-SSD. "gstat -pod" doesn't look bad, 1-2 items in the queue. zfs says it knows nothing of these delays, it just shows two consumer SSD usually busy with their own garbage collection: # zpool iostat -vlq im 3 capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim syncq_read syncq_write asyncq_read asyncq_write scrubq_read trimq_write pool alloc free read write read write read write read write read write read write wait wait pend activ pend activ pend activ pend activ pend activ pend activ ---------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- im 30.6G 14.9G 0 4.84K 0 111M - 2ms - 670us - - - 2ms - - 0 0 0 0 0 0 47 2 0 0 0 0 mirror-0 30.6G 14.9G 0 4.84K 0 111M - 2ms - 670us - - - 2ms - - 0 0 0 0 0 0 47 2 0 0 0 0 ada5p9.elip1 - - 0 1.65K 0 55.7M - 3ms - 927us - - - 3ms - - 0 0 0 0 0 0 0 0 0 0 0 0 ada1p9.elip1 - - 0 3.19K 0 55.8M - 2ms - 536us - - - 1ms - - 0 0 0 0 0 0 47 2 0 0 0 0 ---------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- capacity operations bandwidth total_wait disk_wait syncq_wait asyncq_wait scrub trim syncq_read syncq_write asyncq_read asyncq_write scrubq_read trimq_write pool alloc free read write read write read write read write read write read write wait wait pend activ pend activ pend activ pend activ pend activ pend activ ---------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- im 30.6G 14.9G 0 5.69K 0 103M - 3ms - 544us - - - 2ms - - 0 0 0 0 0 0 84 2 0 0 0 0 mirror-0 30.6G 14.9G 0 5.69K 0 103M - 3ms - 544us - - - 2ms - - 0 0 0 0 0 0 98 4 0 0 0 0 ada5p9.elip1 - - 0 2.05K 0 51.4M - 5ms - 837us - - - 4ms - - 0 0 0 0 0 0 93 2 0 0 0 0 ada1p9.elip1 - - 0 3.64K 0 51.6M - 1ms - 379us - - - 1ms - - 0 0 0 0 0 0 0 0 0 0 0 0 ---------------- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- So where does it happen? (13.1-RELEASE-p5) The root, /usr and /var are on ufs, on the same(!) disk (unmirrored), and have no problem. The next 100-some filesystems are on the "im" pool, and there is the problem, so it is somehow related to ZFS. (The above truss output was in a jail, which is on the ZFS.) I had other things saturating these disks, with no such effect. It seems postgres does something different here. It does also not really look like a congestion problem, because I can have two or three tasks get stuck in this (and all of them will), and they continue at very much different times. Looks rather like some lock wait thing. Any clues, anybody?