svn commit: r308691 - in head/sys: cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs fs/tmpfs kern vm

Fri Nov 18 10:23:28 UTC 2016

On Thu, Nov 17, 2016 at 10:51:40AM -0600, Alan Cox wrote:
> On 11/16/2016 11:52, Ruslan Bukin wrote:
> > On Wed, Nov 16, 2016 at 04:59:39PM +0000, Ruslan Bukin wrote:
> >> On Wed, Nov 16, 2016 at 06:53:43PM +0200, Konstantin Belousov wrote:
> >>> On Wed, Nov 16, 2016 at 01:37:18PM +0000, Ruslan Bukin wrote:
> >>>> I have a panic with this on RISC-V. Any ideas ?
> >>> How did you checked that the revision you replied to, makes the problem ?
> >>> Note that the backtrace below is not reasonable.
> >> I reverted this commit like that and rebuilt kernel:
> >> git show 2fa36073055134deb2df39c7ca46264cfc313d77 | patch -p1 -R
> >>
> >> So the problem is reproducible on dual-core with 32mb mdroot.
> >>
> > I just found another interesting behavior:
> > depending on amount of physical memory :
> > 700m - panic
> > 800m - works fine
> > 1024m - panic
> 
> I think that this behavior is not inconsistent with your report of the
> system crashing if you enabled two cores but not one.  Specifically,
> changing the number of active cores will slightly affect the amount of
> memory that is allocated during initialization.
> 
> There is nothing unusual in the sysctl output that you sent out.
> 
> I have two suggestions.  Try these in order.
> 
> 1. r308691 reduced the size of struct vm_object.  Try undoing the one
> snippet that reduced the vm object size and see if that makes a difference.
> 
> 
> @@ -118,7 +118,6 @@
>  	vm_ooffset_t backing_object_offset;/* Offset in backing object */
>  	TAILQ_ENTRY(vm_object) pager_object_list; /* list of all objects of this pager type */
>  	LIST_HEAD(, vm_reserv) rvq;	/* list of reservations */
> -	struct vm_radix cache;		/* (o + f) root of the cache page radix trie */
>  	void *handle;
>  	union {
>  		/*
> 
> 
> 2. I'd like to know if vm_page_scan_contig() is being called.
> 
> Finally, to simply the situation a little, I would suggest that you
> disable superpage reservations in vmparam.h.  You have no need for them.
> 
> 

I made another one merge from svn-head and problem disappeared for 700m,1024m of physical memory, but now I able to reproduce it with 900m of physical memory.

Restoring 'struct vm_radix cache' in struct vm_object gives no behavior changes.

Adding a panic() call to vm_page_scan_contig gives an original panic (so vm_page_scan_contig is not called),
it looks like size of function is changed and it unhides the original problem.

Disable superpage reservations changes behavior and gives same panic on 1024m boot.

Finally, if I comment ruxagg call in kern_resource then I can't reproduce the problem any more with any amount of memory in any setup:

--- a/sys/kern/kern_resource.c
+++ b/sys/kern/kern_resource.c
@@ -1063,7 +1063,7 @@ rufetch(struct proc *p, struct rusage *ru)
        *ru = p->p_ru;
        if (p->p_numthreads > 0)  {
                FOREACH_THREAD_IN_PROC(p, td) {
-                       ruxagg(p, td);
+                       //ruxagg(p, td);
                        rucollect(ru, &td->td_ru);
                }
        }

I found this patch in my early RISC-V development directory, so it looks the problem persist whole the freebsd/riscv life, but was hidden until now.

Ruslan