Re: zfs and git upload-pack

From: David Christensen <dpchrist_at_holgerdanske.com>
Date: Wed, 10 Aug 2022 00:40:00 UTC
On 8/9/22 06:16, Philipp wrote:
> [2022-08-07 20:52] David Christensen
>> On 8/7/22 12:13, Philipp Takacs wrote:
>>> On Sun, 7 Aug 2022 11:12:20 -0700 David Christensen
>>>> On 8/7/22 10:57, Philipp Takacs wrote:
>>>>> On Sun, 7 Aug 2022 09:54:41 -0700 David Christensen
>>>>>> On 8/7/22 01:28, Philipp wrote:
>>>>>>> Hi all
>>>>>>>
>>>>>>> I host a quite uncommon git repository mostly out of binary
>>>>>>> files. I have the problem every time this repo is cloned the host
>>>>>>> allocate memory and going to swap. This leads to the host being
>>>>>>> unusable and need to force rebooted.
>>>>>>>
>>>>>>> The repo is stored on a zfs and nullmounted in a jail to run the
>>>>>>> git service over ssh. The host is a FreeBSD 13.1 with 4GB RAM and
>>>>>>> 4GB swap.
>>>>>>>
>>>>>>> What I have noticed is that the biggest memory consumtion is from
>>>>>>> mmap() a pack file. For the given repo this has the size of 6,7G.
>>>>>>> I suspect this file is mapped in memory but not correctly
>>>>>>> handled/unmaped (by the kernel) when not enough memory is
>>>>>>> available.
>>>>>>>
>>>>>>> I have tested some options to solve/workaround this issue:
>>>>>>>
>>>>>>> * limit the zfs ARC size in loader.conf
>>>>>>> * zfs set primarycache none for the dataset
>>>>>>> * limit datasize, memoryuse and vmemoryuse via login.conf
>>>>>>> * limit git packedGitLimit

> I have [restored the options to previous values]. Now the behavior has changed. Now one clone was
> succsessfull and at later clones stop with an error (Cannot allocate
> memory). This is better but still not good.


If you reboot the server, does the clone work the first them and then 
always fail?


>> What happens if the clone is attempted by a different user on the same
>> workstation?
>>
>>
>> What happens if the clone is attempted from another workstation?
> 
> The same as described.


That supports a hypothesis that the problem is the server.


>> Please post client console sessions that demonstrate correct operation
>> and failed operation.

> successfull:
> 
> satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/bigrepo
> Cloning into 'bigrepo'...
> remote: Objekte aufzählen: 9661, fertig.
> remote: Gesamt 9661 (Delta 0), Wiederverwendet 0 (Delta 0), Pack wiederverwendet 9661
> Receiving objects: 100% (9661/9661), 6.73 GiB | 5.96 MiB/s, done.
> Resolving deltas: 100% (3/3), done.
> Updating files: 100% (6591/6591), done.


Are those status messages in German?  I do not know German...


> unsuccessfull:
> 
> satanist@hell tmp$ git clone -v ssh://bigrepo@git.bureaucracy.de:2222/bigrepo
> Cloning into 'bigrepo'...
> remote: Enumerating objects: 9661, done.                                                      Rerror: git upload-pack: git-pack-objects died with error.iB/s
> fatal: git upload-pack: aborting due to possible repository corruption on the remote side.
> remote: fatal: packfile ./objects/pack/pack-6fee671a31a59454b539c88d674373d88ad67780.pack cannot be mapped: Cannot allocate memory
> remote: aborting due to possible repository corruption on the remote side.
> fatal: early EOF
> fatal: index-pack failed
> 
> As mentioned earlier the "Cannot allocate memory" is new. The old
> behavior was that the server was unusable till I restarted the server.
> I currently don't know how this exactly looks on the client, but there
> is not mutch info in the output.


Now the status messages are in English.  Do you know why the language 
changed?


Have you run 'git fsck'?

https://git-scm.com/docs/git-fsck


Does the server have ECC memory?


>>> This is a server, a client connect with a
>>> git client over ssh and use git-upload-pack
>>
>>
>> https://git-scm.com/docs/git-upload-pack
> 
> Yes this programm, but I post hear because I susspect this is an freebsd
> issue not an issue with git. This programm basicly mmap() some files,
> parse them and write parts (based on stdin) of the content to stdout.


I desire stable OS's, so I typically use an older production release. 
My servers have:

2022-08-09 16:17:40 dpchrist@f3 ~
$ freebsd-version ; uname -a
12.3-RELEASE-p5
FreeBSD f3.tracy.holgerdanske.com 12.3-RELEASE-p5 FreeBSD 
12.3-RELEASE-p5 GENERIC  amd64


Perhaps you should grab a blank SSD, do an install of 
FreeBSD-12.3-RELEASE, install and set up the software you need without 
using jails, restore the Git repository onto it, and test if that works.


>>> to receive the content of
>>> the repo. The communication of the git client and git-upload-pack works
>>> with stdin/stdout. I can give the logs of my git authorization handler
>>> (inside jail):
>>
>> <snip>
>>
>> What file?
> 
> This file is called fugit.log and is created by a authorization handler
> for git over ssh called fugit. I use fugit to manage authorization for
> multible git repositories. See https://github.com/cbdevnet/fugit/blob/master/fugit


Is fugit is validated and supported on your OS?  Is there a support 
community?  Has anyone else seen this problem?


Has anyone else seen this problem on Git?


> So it looks like the git-upload-pack and the corresponding
> IO causes a lot of cache allocation. This leads to no free memory left
> for the rest of the operation of the server.


As the clone works the first time, and then fails every time thereafter 
with memory-related errors, perhaps there is a memory leak?


> I don't know where to increase the verbosity. ssh just starts a session
> and goes on. The logs of fugit are already posted completly. git-upload-pack
> does not log.


Perhaps we can explore that later.


>> # zpool iostat -v 60
> 
> This looks quite normal here parts of the output:
> 
> Normal operation without a git clone running:
> 
> pool        alloc   free   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> zroot       51.8G   408G      3     14  25.4K   150K
>    mirror-0  51.8G   408G      3     14  25.4K   150K
>      ada1p3      -      -      1      7  12.2K  74.8K
>      ada0p3      -      -      1      7  13.2K  74.8K
> ----------  -----  -----  -----  -----  -----  -----
> 
> During a clone:
> 
>                capacity     operations     bandwidth
> pool        alloc   free   read  write   read  write
> ----------  -----  -----  -----  -----  -----  -----
> zroot       51.8G   408G     22     24  2.87M   226K
>    mirror-0  51.8G   408G     22     24  2.87M   226K
>      ada1p3      -      -     11     12  1.47M   113K
>      ada0p3      -      -     11     12  1.40M   113K
> ----------  -----  -----  -----  -----  -----  -----
> 
> As expected the read goes up during the the clone. But not
> to a level I have conserne about the load.


I assume ada0 and ada1 are HDD's (?).  (If so, an SSD cache device 
really helps; but that should be unrelated to this issue.)


Does the "During a clone" output correspond to the first clone after a 
reboot or to a second or subsequent clone?  You want to look at the 
former; the latter may not show much because of caching.


>> Start the following command in another terminal on the server to monitor
>> CPU and/or IO activity (press 'm' to switch between the two) (press 'q'
>> to exit):
>>
>> # top -S -s 60
> 
> This also looks as expected. The git process (chiled of git-upload-pack)
> uses cpu and memory also creates IO. I have some output some secounds
> befor the git was killed (sorted by RES):
> 
> Mem: 598M Active, 426M Inact, 166M Laundry, 1223M Wired, 1020M Free
> ARC: 543M Total, 185M MFU, 82M MRU, 16K Anon, 8618K Header, 267M Other
>       98M Compressed, 240M Uncompressed, 2,45:1 Ratio
> Swap: 4096M Total, 17M Used, 4079M Free
>   Displaying CPU statistics.
>    PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
> 45719 satanist      1  30    0  1289M   986M pipdwt   2   0:26  17,77% git
> 53388  10001       44  52    0  2755M   237M uwait    2   2:23   0,15% java
> 
> After the java process there are only processes with less then 100MB
> reserved.
> 
> I don't know excactly, but it looks like RES and SIZE adds memory
> allocation and memory mapped files. In this case I would argue there is
> sufficient memory availible to drop, because it can be read from disk.


Check 'man top' for SIZE and RES field definitions just to be sure.  My 
top(1) looks somewhat different.


The difference between SIZE and RES is 303M for git and 2518M for java; 
2821M total.  This total is larger than any Mem, ARC (ZFS), or Swap 
statistic.  My guess is that the overall statistics do not include 
memory-mapped files (?).


See if your top(1) can display the "RSfd" field -- Resident File-Backed 
Memory Size, which should include file mappings.


"1020M Free" is shown.  Was the server hung when you ran top(1)?


Does the server have a console?  Do you have access?  If you log in to 
the console before testing and renice that process to a negative 
priority, the console might keep working when everything else hangs.  If 
you try the same trick over SSH, it could interact with SSH and Git.


David