Re: zfs and git upload-pack

From: Philipp <satanist+freebsd_at_bureaucracy.de>
Date: Fri, 19 Aug 2022 08:59:08 UTC
[2022-08-09 17:40] David Christensen <dpchrist@holgerdanske.com>
> On 8/9/22 06:16, Philipp wrote:
> > [2022-08-07 20:52] David Christensen
> >> On 8/7/22 12:13, Philipp Takacs wrote:
> >>> On Sun, 7 Aug 2022 11:12:20 -0700 David Christensen
> >>>> On 8/7/22 10:57, Philipp Takacs wrote:
> >>>>> On Sun, 7 Aug 2022 09:54:41 -0700 David Christensen
> >>>>>> On 8/7/22 01:28, Philipp wrote:
> >>>>>>> Hi all
> >>>>>>>
> >>>>>>> I host a quite uncommon git repository mostly out of binary
> >>>>>>> files. I have the problem every time this repo is cloned the host
> >>>>>>> allocate memory and going to swap. This leads to the host being
> >>>>>>> unusable and need to force rebooted.
> >>>>>>>
> >>>>>>> The repo is stored on a zfs and nullmounted in a jail to run the
> >>>>>>> git service over ssh. The host is a FreeBSD 13.1 with 4GB RAM and
> >>>>>>> 4GB swap.
> >>>>>>>
> >>>>>>> What I have noticed is that the biggest memory consumtion is from
> >>>>>>> mmap() a pack file. For the given repo this has the size of 6,7G.
> >>>>>>> I suspect this file is mapped in memory but not correctly
> >>>>>>> handled/unmaped (by the kernel) when not enough memory is
> >>>>>>> available.
> >>>>>>>
> >>>>>>> I have tested some options to solve/workaround this issue:
> >>>>>>>
> >>>>>>> * limit the zfs ARC size in loader.conf
> >>>>>>> * zfs set primarycache none for the dataset
> >>>>>>> * limit datasize, memoryuse and vmemoryuse via login.conf
> >>>>>>> * limit git packedGitLimit
>
> > I have [restored the options to previous values]. Now the behavior has changed. Now one clone was
> > succsessfull and at later clones stop with an error (Cannot allocate
> > memory). This is better but still not good.
>
>
> If you reboot the server, does the clone work the first them and then 
> always fail?

I have currently not that mutch time, I might test this next week.

> >> What happens if the clone is attempted by a different user on the same
> >> workstation?
> >>
> >>
> >> What happens if the clone is attempted from another workstation?
> > 
> > The same as described.
>
>
> That supports a hypothesis that the problem is the server.
>
>
> >> Please post client console sessions that demonstrate correct operation
> >> and failed operation.
>
> > successfull:
> > 
> > satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/bigrepo
> > Cloning into 'bigrepo'...
> > remote: Objekte aufzählen: 9661, fertig.
> > remote: Gesamt 9661 (Delta 0), Wiederverwendet 0 (Delta 0), Pack wiederverwendet 9661
> > Receiving objects: 100% (9661/9661), 6.73 GiB | 5.96 MiB/s, done.
> > Resolving deltas: 100% (3/3), done.
> > Updating files: 100% (6591/6591), done.
>
>
> Are those status messages in German?  I do not know German...

A sorry havent paied attention. I can translate:

aufzählen => counting
Gesamt => total
Wiederverwendet => reused

> > unsuccessfull:
> > 
> > satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/bigrepo
> > Cloning into 'bigrepo'...
> > remote: Enumerating objects: 9661, done.                                                      Rerror: git upload-pack: git-pack-ob
> jects died with error.iB/s
> > fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
> > remote: fatal: packfile ./objects/pack/pack-6fee671a31a59454b539c88d674373d88ad67780.pack cannot be mapped: Cannot allocate memory
> > remote: aborting due to possible repository corruption on the remote side.
> > fatal: early EOF
> > fatal: index-pack failed
> > 
> > As mentioned earlier the "Cannot allocate memory" is new. The old
> > behavior was that the server was unusable till I restarted the server.
> > I currently don't know how this exactly looks on the client, but there
> > is not mutch info in the output.
>
>
> Now the status messages are in English.  Do you know why the language 
> changed?

Because I lied a bit. The first clone was directly from the host. The
other one was over the jail responsible for git. There are a bit
diffrent lang settings and the first one don't use fugit. Also the repo
is read from a nullfs which points to the same data. But the "Cannot
allocate memory" happends indipendet of this.

> Have you run 'git fsck'?
>
> https://git-scm.com/docs/git-fsck

Not before, but this currently behaves like the git clone before
restoring the defaults. So free memory goes down to 85M and server
start swapping. I have killed it now, before the server was unusable.

I have tested this on a local copy created with scp. No error.

> Does the server have ECC memory?

No

> >>> This is a server, a client connect with a
> >>> git client over ssh and use git-upload-pack
> >>
> >>
> >> https://git-scm.com/docs/git-upload-pack
> > 
> > Yes this programm, but I post hear because I susspect this is an freebsd
> > issue not an issue with git. This programm basicly mmap() some files,
> > parse them and write parts (based on stdin) of the content to stdout.
>
>
> I desire stable OS's, so I typically use an older production release. 
> My servers have:
>
> 2022-08-09 16:17:40 dpchrist@f3 ~
> $ freebsd-version ; uname -a
> 12.3-RELEASE-p5
> FreeBSD f3.tracy.holgerdanske.com 12.3-RELEASE-p5 FreeBSD 
> 12.3-RELEASE-p5 GENERIC  amd64
>
>
> Perhaps you should grab a blank SSD, do an install of 
> FreeBSD-12.3-RELEASE, install and set up the software you need without 
> using jails, restore the Git repository onto it, and test if that works.

I have not the time to reinstall the complete server (ldap, mailserver,
http, shell, ...). Also I don't have the money to buy extra hardware.

> >>> to receive the content of
> >>> the repo. The communication of the git client and git-upload-pack works
> >>> with stdin/stdout. I can give the logs of my git authorization handler
> >>> (inside jail):
> >>
> >> <snip>
> >>
> >> What file?
> > 
> > This file is called fugit.log and is created by a authorization handler
> > for git over ssh called fugit. I use fugit to manage authorization for
> > multible git repositories. See https://github.com/cbdevnet/fugit/blob/master/fugit
>
>
> Is fugit is validated and supported on your OS?

Not exactly I have tested it and patched the issues I have found.

> Is there a support community?

Yes, but it's small. I would guess 3 people including me.

> Has anyone else seen this problem?

No, but the problem also ocourse if I clone the repo without fugit.

> Has anyone else seen this problem on Git?

I haven't found any repots.

> > So it looks like the git-upload-pack and the corresponding
> > IO causes a lot of cache allocation. This leads to no free memory left
> > for the rest of the operation of the server.
>
>
> As the clone works the first time, and then fails every time thereafter 
> with memory-related errors, perhaps there is a memory leak?

This is also my guess, the question is where and how to fix it. I
suspect some fs cache. Because as I mentioned in my first mail, after
a clone free memory stay low till I unmount the zfs dataset.


> > I don't know where to increase the verbosity. ssh just starts a session
> > and goes on. The logs of fugit are already posted completly. git-upload-pack
> > does not log.
>
>
> Perhaps we can explore that later.
>
>
> >> # zpool iostat -v 60
> > 
> > This looks quite normal here parts of the output:
> > 
> > Normal operation without a git clone running:
> > 
> > pool        alloc   free   read  write   read  write
> > ----------  -----  -----  -----  -----  -----  -----
> > zroot       51.8G   408G      3     14  25.4K   150K
> >    mirror-0  51.8G   408G      3     14  25.4K   150K
> >      ada1p3      -      -      1      7  12.2K  74.8K
> >      ada0p3      -      -      1      7  13.2K  74.8K
> > ----------  -----  -----  -----  -----  -----  -----
> > 
> > During a clone:
> > 
> >                capacity     operations     bandwidth
> > pool        alloc   free   read  write   read  write
> > ----------  -----  -----  -----  -----  -----  -----
> > zroot       51.8G   408G     22     24  2.87M   226K
> >    mirror-0  51.8G   408G     22     24  2.87M   226K
> >      ada1p3      -      -     11     12  1.47M   113K
> >      ada0p3      -      -     11     12  1.40M   113K
> > ----------  -----  -----  -----  -----  -----  -----
> > 
> > As expected the read goes up during the the clone. But not
> > to a level I have conserne about the load.
>
>
> I assume ada0 and ada1 are HDD's (?).  (If so, an SSD cache device 
> really helps; but that should be unrelated to this issue.)

This are SSD's.


> Does the "During a clone" output correspond to the first clone after a 
> reboot or to a second or subsequent clone?  You want to look at the 
> former; the latter may not show much because of caching.

I belive it was during a failing one.

I can check for the output of the first one after boot next week.

> >> Start the following command in another terminal on the server to monitor
> >> CPU and/or IO activity (press 'm' to switch between the two) (press 'q'
> >> to exit):
> >>
> >> # top -S -s 60
> > 
> > This also looks as expected. The git process (chiled of git-upload-pack)
> > uses cpu and memory also creates IO. I have some output some secounds
> > befor the git was killed (sorted by RES):
> > 
> > Mem: 598M Active, 426M Inact, 166M Laundry, 1223M Wired, 1020M Free
> > ARC: 543M Total, 185M MFU, 82M MRU, 16K Anon, 8618K Header, 267M Other
> >       98M Compressed, 240M Uncompressed, 2,45:1 Ratio
> > Swap: 4096M Total, 17M Used, 4079M Free
> >   Displaying CPU statistics.
> >    PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
> > 45719 satanist      1  30    0  1289M   986M pipdwt   2   0:26  17,77% git
> > 53388  10001       44  52    0  2755M   237M uwait    2   2:23   0,15% java
> > 
> > After the java process there are only processes with less then 100MB
> > reserved.
> > 
> > I don't know excactly, but it looks like RES and SIZE adds memory
> > allocation and memory mapped files. In this case I would argue there is
> > sufficient memory availible to drop, because it can be read from disk.
>
>
> Check 'man top' for SIZE and RES field definitions just to be sure.  My 
> top(1) looks somewhat different.

top(1):
> SIZE is the total size of the process (text, data, and stack),
> RES is the current amount of resident memory

So I would interprete this as the mmap is included.

> The difference between SIZE and RES is 303M for git and 2518M for java; 
> 2821M total.  This total is larger than any Mem, ARC (ZFS), or Swap 
> statistic.  My guess is that the overall statistics do not include 
> memory-mapped files (?).

No this only means the memory was allocated but not in physical
memory. For the java process this is propably some libs which are 
currently not used. For git RES goes constanly up till RES and SIZE is
about the same and then it tries to mmap the next junk of the file.


> See if your top(1) can display the "RSfd" field -- Resident File-Backed 
> Memory Size, which should include file mappings.

I haven't found an option to do so.

> "1020M Free" is shown.  Was the server hung when you ran top(1)?

No. Currently no hung during clone.


> Does the server have a console?  Do you have access?  If you log in to 
> the console before testing and renice that process to a negative 
> priority, the console might keep working when everything else hangs.  If 
> you try the same trick over SSH, it could interact with SSH and Git.

The server has a console over IPMI. I can request a shell for IPMI
access from my provider. I can try to monitor it over this console
next week.

Philipp