Re: git: 184c63db3c94 - main - Fix clerical error in page alloc

From: Guido Falsi <madpilot_at_FreeBSD.org>
Date: Sat, 25 Dec 2021 17:11:50 UTC
On 25/12/21 12:23, FreeBSD User wrote:
> Am Fri, 24 Dec 2021 08:51:39 GMT
> schrieb Doug Moore <dougm@FreeBSD.org>:
> 
>> The branch main has been updated by dougm:
>>
>> URL:
>> https://cgit.FreeBSD.org/src/commit/?id=184c63db3c949d8ba766dc7b2bd2f082404e169d
>>
>> commit 184c63db3c949d8ba766dc7b2bd2f082404e169d
>> Author:     Doug Moore <dougm@FreeBSD.org>
>> AuthorDate: 2021-12-24 08:47:21 +0000
>> Commit:     Doug Moore <dougm@FreeBSD.org>
>> CommitDate: 2021-12-24 08:47:21 +0000
>>
>>      Fix clerical error in page alloc
>>      
>>      Fix a very recent change that introduced a page accounting error
>> in case of a reserveration being broken.
>>      Reviewed by:    alc
>>      Fixes:  fb38b29b5609 (page_alloc_br) vm_page: Remove extra test,
>> dup code from page alloc Differential Revision:
>> https://reviews.freebsd.org/D33645 ---
>>   sys/vm/vm_page.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/sys/vm/vm_page.c b/sys/vm/vm_page.c
>> index c24da96f4312..03351b0ad3dd 100644
>> --- a/sys/vm/vm_page.c
>> +++ b/sys/vm/vm_page.c
>> @@ -2186,11 +2186,11 @@ vm_page_find_contig_domain(int domain, int
>> req, u_long npages, vm_paddr_t low, vm_page_t m_ret;
>>   
>>   	vmd = VM_DOMAIN(domain);
>> -	if (!vm_domain_allocate(vmd, req, npages))
>> -		return (NULL);
>>   #if VM_NRESERVLEVEL > 0
>>   again:
>>   #endif
>> +	if (!vm_domain_allocate(vmd, req, npages))
>> +		return (NULL);
>>   	/*
>>   	 * Try to allocate the pages from the free page queues.
>>   	 */
>>
> 
> It seems that our hosts running with this patch are "dead" after a
> while while under load (poudriere): ssh on both IPv4 and IPv6 are dead
> as well as http/https on IPv4/IPv6 (remote site, no connection via ssh
> anymore, but hosts respond to ping, nmap show several other services on
> local network as reachable, but no ssh(22)/apache24(http/https).
> Other hosts at OS level before this patch seem to be allright so far
> (i.e. FreeBSD 14.0-CURRENT #39 main-n251899-fa255ab1b895: Thu Dec 23
> 13:48:41 CET 2021 amd64).
> 
> I have to admit its a wild guess that this patch is the culprit, but it
> is strange that two out of four hosts with this patch applied are now
> both unreachable on both ssh and http (lates www/apache24) while two
> other hosts stuck with the version showed above seem to operate on
> ssh/http.
> 
> Can investigate earliest after 26th of December.


I'm also seeing a strange behaviour related to memory and the VM 
subsystem with recent (24th December) head . It was not happening with 
head from mid November.

I'm seeing strange issues with virtualbox on recent head too. It fails 
to launch VMs or VMs pause due to memory exhaustion, while the machine 
has lots of free memory. Maybe the real issue is memory fragmentation 
though.

I'm now testing updating to newer head including commit 
0d5fac287294490ac488d74e598e019334610bdb (vm: alloc pages from reserv 
before breaking it) which is definitely related and maybe fix this.

Anyway there is definitely something going on with recent changes to VM 
subsystem that requires investigation.

I can reproduce this easily, just install virtualbox-ose, create a VM 
with a non tiny memory footprint and run it, or run more than one. It is 
easier to reproduce if some other software is already running on the 
machine (this is why I suspect some memory fragmentation issue).

-- 
Guido Falsi <madpilot@FreeBSD.org>