Re: zfs and git upload-pack

From: Philipp Takacs <satanist+freebsd_at_bureaucracy.de>
Date: Sun, 07 Aug 2022 19:13:48 UTC
On Sun, 7 Aug 2022 11:12:20 -0700
David Christensen <dpchrist@holgerdanske.com> wrote:

> On 8/7/22 10:57, Philipp Takacs wrote:
> > On Sun, 7 Aug 2022 09:54:41 -0700
> > David Christensen <dpchrist@holgerdanske.com> wrote:
> >   
> >> On 8/7/22 01:28, Philipp wrote:  
> >>> Hi all
> >>>
> >>> I host a quite uncommon git repository mostly out of binary
> >>> files. I have the problem every time this repo is cloned the host
> >>> allocate memory and going to swap. This leads to the host being
> >>> unusable and need to force rebooted.
> >>>
> >>> The repo is stored on a zfs and nullmounted in a jail to run the
> >>> git service over ssh. The host is a FreeBSD 13.1 with 4GB RAM and
> >>> 4GB swap.
> >>>
> >>> What I have noticed is that the biggest memory consumtion is from
> >>> mmap() a pack file. For the given repo this has the size of 6,7G.
> >>> I suspect this file is mapped in memory but not correctly
> >>> handled/unmaped (by the kernel) when not enough memory is
> >>> available.
> >>>
> >>> I have tested some options to solve/workaround this issue:
> >>>
> >>> * limit the zfs ARC size in loader.conf
> >>> * zfs set primarycache none for the dataset
> >>> * limit datasize, memoryuse and vmemoryuse via login.conf
> >>> * limit git packedGitLimit
> >>>
> >>> None of them have solved the issue.
> >>>
> >>> On my last test I have noticed that the memory is not freed after
> >>> the git commands are stopped. To free the memory I had to unmount
> >>> the zfs dataset:
> >>>
> >>> satanist@confusion:~ $ freecolor -om
> >>>                total       used       free     shared    buffers
> >>> cached Mem:          3923       3881         41          0
> >>> 0 0 Swap:         4096        456       3639
> >>> satanist@confusion:~ $ sudo zfs umount
> >>> zroot/data/git/bigrepository satanist@confusion:~ $ freecolor -om
> >>>                total       used       free     shared    buffers
> >>> cached Mem:          3923        278       3645          0
> >>> 0 0 Swap:         4096        456       3639
> >>>
> >>> Any idea how to debug this or what else I could try solve this?
> >>>
> >>> Philipp  
> >>
> >>
> >> Add memory; the more, the better.  
> > 
> > Can you borrow me your credit card?
> > 
> > Jokes aside. this repo gets cloned a few times a month. Currently
> > the Host dies because one client try to clone this repo. This
> > should be work with 4GB memory.  
> 
> 
> Please post console sessions that demonstrate cloning without failure 
> and cloning with failure.

Not sure what you mean. This is a server, a client connect with a
git client over ssh and use git-upload-pack to receive the content of
the repo. The communication of the git client and git-upload-pack works
with stdin/stdout. I can give the logs of my git authorization handler
(inside jail):

Sat Aug  6 12:01:27 UTC 2022 username: Command is git-upload-pack
'/bigrepository'
Sat Aug  6 12:01:27 UTC 2022 username: pull from repo /bigrepository
Sat Aug  6 12:01:27 UTC 2022 username: Trying to read from
/bigrepository (Path /git/bigrepository/bigrepository.git)
Sat Aug  6 14:35:23 UTC 2022 username: Pull accepted

The last line mean the clone was finished[0]. But at this time
everything else on the host was unusable. Here the corresponding content
of /var/log/messages:

Aug  6 12:25:07 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 12:26:41 confusion sshd[50953]: error:
Fssh_kex_exchange_identification: Connection closed by remote host
Aug  6 12:28:37 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 12:40:45 confusion sshd[62976]: fatal: Timeout before
authentication for 103.188.176.3 port 45910
Aug  6 13:02:03 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (2 occurrences)
Aug  6 13:02:03 confusion syslogd: last message repeated 2 times
Aug  6 13:08:29 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 13:10:30 confusion syslogd: last message repeated 1 times
Aug  6 13:15:39 confusion syslogd: last message repeated 1 times
Aug  6 13:23:25 confusion syslogd: last message repeated 2 times
Aug  6 13:33:37 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (2 occurrences)
Aug  6 13:36:08 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 13:37:59 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (2 occurrences)
Aug  6 14:00:51 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 14:00:51 confusion syslogd: last message repeated 3 times
Aug  6 14:02:09 confusion syslogd: last message repeated 1 times
Aug  6 14:10:38 confusion syslogd: last message repeated 1 times
Aug  6 14:10:38 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0
([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow:
8 already in queue awaiting acceptance (1 occurrence
s)
Aug  6 14:12:18 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (3 occurrences)
Aug  6 14:12:18 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0
([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow:
8 already in queue awaiting acceptance (2 occurrence
s)
Aug  6 14:15:04 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 14:16:35 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (2 occurrences)
Aug  6 14:17:40 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 14:20:39 confusion syslogd: last message repeated 1 times
Aug  6 14:20:39 confusion kernel: sonewconn: pcb 0xfffff8007b48a9b0
([2a01:138:9000:bc42::15e4:2d4d]:25 (proto 6)): Listen queue overflow:
8 already in queue awaiting acceptance (1 occurrence
s)
Aug  6 14:22:12 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 14:24:20 confusion syslogd: last message repeated 1 times
Aug  6 14:27:44 confusion syslogd: last message repeated 1 times
Aug  6 14:28:46 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (2 occurrences)
Aug  6 14:30:15 confusion login[8129]: ROOT LOGIN (root) ON ttyu2
Aug  6 14:33:00 confusion kernel: sonewconn: pcb 0xfffff801292ea9b0
(10.188.42.86:25 (proto 6)): Listen queue overflow: 8 already in queue
awaiting acceptance (1 occurrences)
Aug  6 14:34:16 confusion kernel: pid 13305 (java), jid 5, uid 10001,
was killed: failed to reclaim memory
Aug  6 14:34:28 confusion sshd[70018]: warning: can't get client
address: Connection reset by peer
Aug  6 14:36:48 confusion kernel: pid 28013 (slapd), jid 1, uid 389,
was killed: failed to reclaim memory
Aug  6 14:37:06 confusion sshd[74213]: error:
Fssh_kex_exchange_identification: Connection closed by remote host
Aug  6 14:37:13 confusion sshd[80386]: error: beginning MaxStartups
throttling
Aug  6 14:37:14 confusion kernel: pid 56727 (smtpd), jid 4, uid 257,
was killed: failed to reclaim memory
Aug  6 14:34:00 confusion sshd[68901]: fatal: Timeout before
authentication for 84.147.51.239 port 58360
Aug  6 14:37:59 confusion login[87884]: ROOT LOGIN (root) ON ttyu2
Aug  6 14:38:00 confusion sshd[80386]: error: beginning MaxStartups
throttling
Aug  6 14:40:27 confusion syslogd: kernel boot file is
/boot/kernel/kernel
Aug  6 14:40:27 confusion kernel: ---<<BOOT>>---

Between 12:00 and 14:00 the server started to be slow and running
ssh/mosh session stopped working. Starting new sessions over ssh was
not possible. The root login at 14:38 was me over ipmi try to somehow
get the the host working again. But I could login and only run top then
this session was also unusable. I have then restarted the server over
ipmi.

Philipp

[0] https://github.com/cbdevnet/fugit/blob/master/fugit#L94