shared mem and panics when out of PV Entries

Wed Mar 26 03:30:03 PST 2003

Andrew Kinney wrote:
> On 25 Mar 2003, at 17:56, Igor Sysoev wrote:
> > > So, what's the best approach to limiting memory shared via fork() or
> > > reducing PV Entry usage by that memory?  Is there something I can do
> > > with the kernel config or sysctl to accomplish this?
> >
> > No, as far as I know there's no way to do it.
> > The irony is that you do not need the most of these PV entries because
> > you are not swaping.
> 
> My thoughts exactly.  I suppose not all that many people run well
> used web servers with 4GB of RAM, so there wouldn't be any
> reason for this issue to come up on a regular basis.

You need the pv_entry_t's because there is one on each vm_page_t
for each virtual mapping for the page.

This is necessary to correctly mark things clean or dirty,
and to deal with copy-on-write.

What is *actually* ironic is that, for the most part, these
things *may* be able to be shared, if they were made slightly
more complex and reference counted, and you were willing to
split some of the copy-on-write code a little bit further
between machine-dependent and machine-independent.

Matt Dillon would be the person to talk to about this; I
could do it, but he'd do it faster.

> I'm going to expose my newbness here with respect to BSD
> memory management, but could the number of files served and
> filesystem caching have something to do with the PV Entry usage
> by Apache?  We've got around 1.2 million files served by this
> Apache.  Could it be that the extensive PV Entry usage has
> something to do with that?  Obviously, not all are accessed all the
> time, but it wouldn't take a very large percentage of them being
> accessed to cause issues if filesystem caching is in any way
> related to PV Entry usage by Apache.

When you fork, you copy the address space, which means you copy
the pv_entry_t's, so the answer is a tentative "yes".  But files
which are not open are not mapped, so unless you have a lot of
mmap's hanging around, this shouldn't be an issue with System V
shared memory.

> We had keepalive set to the default of "on" (at least default for this
> install) with the default keepalive timeout of 15 seconds.
> 
> Dropping the keepalive timeout down to 3 seconds has
> dramatically reduced the number of Apache processes required to
> serve the load.  With the new settings, we're averaging 30 to 80
> Apache processes, which is much more manageable in terms of
> memory usage, though we weren't anywhere near running out of
> physical RAM prior to this.  We're servicing a little over 1000
> requests per minute, which by some standards isn't a huge amount.
> 
> We're still seeing quite heavy PV Entry usage, though.  The
> reduced number of Apache processes (by more than half) doesn't
> seem to have appreciably reduced PV Entry usage versus the
> previous settings, so I suspect I may have been wrong about
> memory sharing as the culprit for the PV Entry usage.  This
> observation may just be coincidence, but the average PV Entry
> usage seems to have gone up by a couple million entries since the
> changes to the Apache config.
> 
> Time will tell if the PV Entries are still getting hit hard enough to
> cause panics due to running out of them.  They're supposed to get
> forcibly recycled at 90% utilization from what I see in the kernel
> code, so if we never get above 90% utilization I guess I could
> consider the issue resolved.
> 
> What other things in Apache (besides memory sharing via PHP
> and/or mod_perl) could generate PV Entry usage on a massive
> scale?

Basically, you don't really care about pv_entry_t's, you care
about KVA space, and running out of it.

In a previous posting, you suggested increasing KVA_PAGES fixed
the problem, but caused a pthreads problem.

What you meant to say is that it caused a Linux threads kernel
module mailbox location problem for the user space Linux threads
library.  In other words, it's because you are using the Linux
threads implementation, that you have this problem, not FreeBSD's
pthreads.

Probably, the Linux threads kernel module should be modified to
provide the mailbox location, and then the user space Linux
threads library should be modified to utilize sysctl to talk
to the kernel module, and establish the locations, so that they
don't have to be agreed upon at compile time for programs using
the code.

In any case, the problem you are having is because the uma_zalloc()
(UMA) allocator is feeling KVA space pressure.

One way to move this pressure somewhere else, rather than dealing
with it in an area which results in a panic on you because the code
was not properly retrofit for the limitations of UMA, is to decide
to preallocate the UMA region used for the "PV ENTRY" zone.

The way to do this is to modify /usr/src/sys/i386/i386/pmap.c
at about line 122, where it says:

	#define MINPV 2048

to say instead:

	#ifndef MINPV
	#define MINPV 2048	/* default, if not specified in config */
	#endif

And to realize that there is an "opt_pmap.h".  To activate this,
you will need to add this line to /usr/src/sys/conf/options.i386:

	MINPV	opt_pmap.h

With this in place, you will be able to adjust the initial minimum
allocations upward by saying:

	options MINPV=4096

(or whatever) in your kernel config file.

Note: you may want to "#if 0" out the #define in pmap.c altogether,
to reassure yourself that this is working; it's easy to make a mistake
in this part of the kernel.

-- Terry