close() taking minutes

From: Peter <pmc_at_citylink.dinoex.sub.org>
Date: Fri, 20 Jan 2023 04:11:38 UTC
Does this look familiar with anybody? ("truss -dD ls -l something")

0.047760369 0.000017548 open("/etc/nsswitch.conf",O_RDONLY|O_CLOEXEC,0666) = 4 (0x4)
0.047845099 0.000011290 fstat(4,{ mode=-rw-r--r-- ,inode=13886,size=272,blksize=4096 }) = 0 (0x0)
0.047932166 0.000034182 read(4,"#\n# nsswitch.conf(5) - name ser"...,4096) = 272 (0x110)
0.048007308 0.000010661 read(4,0x801848000,4096) = 0 (0x0)
45.827173802 45.779056767 close(4)               = 0 (0x0)
             ^^^^^^^^^^^^
...
45.866272643 0.000020608 open("/etc/localtime",O_RDONLY,077) = 6 (0x6)
45.866398048 0.000017731 fstat(6,{ mode=-r--r--r-- ,inode=41572,size=2309,blksize=4096 }) = 0 (0x0)
45.866505596 0.000054084 read(6,"TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0"...,41448) = 2309 (0x905)
49.511709215 3.645026276 close(6)                = 0 (0x0)
             ^^^^^^^^^^^

(The longest I've seen was more than 5 minutes)

There is otherwise nothing wrong with the machine, only I am running
a postgres table merge for some 10 mio. rows. And only during that the
effect happens, reproducibly.

The concerned disk is a mirror of two consumer SATA-SSD.

"gstat -pod" doesn't look bad, 1-2 items in the queue.

zfs says it knows nothing of these delays, it just shows two
consumer SSD usually busy with their own garbage collection:

# zpool iostat -vlq im 3
                    capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim   syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write
pool              alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
im                30.6G  14.9G      0  4.84K      0   111M      -    2ms      -  670us      -      -      -    2ms      -      -      0      0      0      0      0      0     47      2      0      0      0      0
  mirror-0        30.6G  14.9G      0  4.84K      0   111M      -    2ms      -  670us      -      -      -    2ms      -      -      0      0      0      0      0      0     47      2      0      0      0      0
    ada5p9.elip1      -      -      0  1.65K      0  55.7M      -    3ms      -  927us      -      -      -    3ms      -      -      0      0      0      0      0      0      0      0      0      0      0      0
    ada1p9.elip1      -      -      0  3.19K      0  55.8M      -    2ms      -  536us      -      -      -    1ms      -      -      0      0      0      0      0      0     47      2      0      0      0      0
----------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
                    capacity     operations     bandwidth    total_wait     disk_wait    syncq_wait    asyncq_wait  scrub   trim   syncq_read    syncq_write   asyncq_read  asyncq_write   scrubq_read   trimq_write
pool              alloc   free   read  write   read  write   read  write   read  write   read  write   read  write   wait   wait   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ   pend  activ
----------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----
im                30.6G  14.9G      0  5.69K      0   103M      -    3ms      -  544us      -      -      -    2ms      -      -      0      0      0      0      0      0     84      2      0      0      0      0
  mirror-0        30.6G  14.9G      0  5.69K      0   103M      -    3ms      -  544us      -      -      -    2ms      -      -      0      0      0      0      0      0     98      4      0      0      0      0
    ada5p9.elip1      -      -      0  2.05K      0  51.4M      -    5ms      -  837us      -      -      -    4ms      -      -      0      0      0      0      0      0     93      2      0      0      0      0
    ada1p9.elip1      -      -      0  3.64K      0  51.6M      -    1ms      -  379us      -      -      -    1ms      -      -      0      0      0      0      0      0      0      0      0      0      0      0
----------------  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----  -----

So where does it happen? (13.1-RELEASE-p5)

The root, /usr and /var are on ufs, on the same(!) disk (unmirrored),
and have no problem.

The next 100-some filesystems are on the "im" pool, and there is the
problem, so it is somehow related to ZFS. (The above truss output
was in a jail, which is on the ZFS.)

I had other things saturating these disks, with no such effect. It
seems postgres does something different here.

It does also not really look like a congestion problem, because I
can have two or three tasks get stuck in this (and all of them will),
and they continue at very much different times. Looks rather like
some lock wait thing.

Any clues, anybody?