shared mem and panics when out of PV Entries
Andrew Kinney
andykinney at advantagecom.net
Wed Mar 26 16:02:08 PST 2003
On 25 Mar 2003, at 19:28, Terry Lambert wrote:
> Basically, you don't really care about pv_entry_t's, you care
> about KVA space, and running out of it.
>
> In a previous posting, you suggested increasing KVA_PAGES fixed
> the problem, but caused a pthreads problem.
Will running out of KVA space indirectly cause PV Entries to hit its
limit as shown in sysctl vm.zone? To my knowledge, I've never
seen a panic on this system directly resulting from running out of
KVA space. They've all been traced back to running out of
available PV Entries.
I'm invariably hitting the panic in pmap_insert_entry() and I only get
the panic when I run out of available PV Entries. I've seen nothing
to indicate that running out of KVA space is causing the panics,
though I'm still learning the ropes of the BSD memory management
code and recognize that there are many interactions with different
portions of the memory management code that could have
unforeseen results.
Regarding the other thread you mentioned, increasing
KVA_PAGES was just a way to make it possible to squeeze a
higher PV Entry limit out of the system because it would allow a
higher value for PMAP_SHPGPERPROC while still allowing the
system to boot. I have not determined if it "fixed the problem"
because I had to revert to an old kernel when MySQL wigged out
on boot, apparently due to the threading issue in 4.7 that shows up
with increased KVA_PAGES. I never got a chance to increase
PMAP_SHPGPERPROC after increasing KVA_PAGES because
MySQL is an important service on this system and I had to get it
back up and running.
> What you meant to say is that it caused a Linux threads kernel
> module mailbox location problem for the user space Linux threads
> library. In other words, it's because you are using the Linux
> threads implementation, that you have this problem, not
> FreeBSD's pthreads.
I may have misspoken in the previous thread about pthreads having
a problem when KVA_PAGES was increased. I was referencing a
previous thread in which the author stated pthreads had a problem
when KVA_PAGES was increased and had assumed that the
author knew what he was talking about. At any rate, this was
apparently patched and included into the RELENG_4 tree after 4.7-
RELEASE. I plan on grabbing RELENG_4_8 once it's officially
released. That should give me room to play with KVA_PAGES, if
necessary, without breaking MySQL.
Also worth reiterating is that resource usage by Apache is the
source of the panics. The version I'm using is 1.3.27, so it doesn't
even make use of threading, at least not like Apache 2.0. I would
just switch to Apache 2.0, but it doesn't support all the modules we
need yet. Threads were only an issue with MySQL when
KVA_PAGES>256, which doesn't appear to be related to the
panics happening while KVA_PAGES=256.
> In any case, the problem you are having is because the uma_zalloc()
> (UMA) allocator is feeling KVA space pressure.
>
> One way to move this pressure somewhere else, rather than dealing with
> it in an area which results in a panic on you because the code was not
> properly retrofit for the limitations of UMA, is to decide to
> preallocate the UMA region used for the "PV ENTRY" zone.
I haven't read up on that section of the source, but I'll go do so now
and determine if the changes you suggested would help in this
case. I know in some other posts you're a strong advocate for
mapping all physical RAM into KVA right up front rather than
messing around with some subset of physical RAM getting
mapped into KVA. That approach seems to make sense, at least
for large memory systems, if I understand all the dynamics of the
situation correctly.
> The way to do this is to modify /usr/src/sys/i386/i386/pmap.c
> at about line 122, where it says:
>
> #define MINPV 2048
>
If I read the code correctly in pmap.c, MINPV just guarantees that
the system will have at least *some* PV Entries available by
preallocating the KVA (28 bytes each on my system) for those PV
Entries specified by MINPV. See the section of
/usr/src/sys/i386/i386/pmap.c labelled "init the pv free list". I'm not
certain it makes a lot of sense to preallocate KVA space for
11,113,502 PV Entries when we don't appear to be completely
KVA starved.
As I understand it (and as you seem to have suggested),
increasing MINPV would only be useful if we were running out of
KVA due to other KVA consumers (like buffers, cache, mbuf
clusters, and etc.) before we could get enough PV Entries on the
"free" list. I don't believe that is what's happening here.
Here's some sysctl's that are pertinent:
vm.zone_kmem_kvaspace: 350126080
vm.kvm_size: 1065353216
vm.kvm_free: 58720256
vm.zone_kmem_kvaspace indicates (if I understand it correctly)
that kmem_alloc() allocated about 334MB of KVA at boot.
vm.kvm_free indicates that KVM is only pressured after the system
has been running awhile. The sysctl's above were read after
running for about 90 minutes after a reboot during non-peak usage
hours. At that time, there were 199MB allocated to buffers, 49MB
allocated to cache, and 353MB wired. During peak usage, we will
typically have 199MB allocated to buffers, ~150MB allocated to
cache, and 500MB to 700MB wired. If I understand things
correctly, that would mean we're peaking around the 1GB KVM
mark and there's probably some recycling of memory used by
cache to free up KVM for other uses when necessary.
However, I don't believe we're putting so much pressure on
KVA/KVM as to run out of 28 byte chunks for PV Entries to be
made. Assuming, once again, that I understand things correctly, if
we were putting that much pressure on KVA/KVM, cache would go
nearer to zero while the system attempted to make room for those
28 byte PV Entries. Even during peak usage and just prior to
panic, the system still has over 100MB of cache showing. I have a
'systat -vm' from a few seconds prior to one of the panics that
showed over 200MB of KVM free.
So, I don't think the memory allocation in KVA/KVM associated
with PV Entries is the culprit of our panics. Here's a copy of one of
the panics and the trace I did on it.
Fatal trap 12: page fault while in kernel mode
mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
fault virtual address = 0x4
fault code = supervisor write, page not present
instruction pointer = 0x8:0xc02292bd
stack pointer = 0x10:0xed008e0c
frame pointer = 0x10:0xed008e1c
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 61903 (httpd)
interrupt mask = net tty bio cam <- SMP: XXX
trap number = 12
panic: page fault
mp_lock = 01000002; cpuid = 1; lapic.id = 00000000
boot() called on cpu#1
Instruction pointer trace:
# nm -n /kernel | grep c02292bd
# nm -n /kernel | grep c02292b
# nm -n /kernel | grep c02292
c022929c t pmap_insert_entry
exact line number of instruction:
----------------------------------
(kgdb) l *pmap_insert_entry+0x21
0xc02292bd is in pmap_insert_entry
(/usr/src/sys/i386/i386/pmap.c:1636).
1631 int s;
1632 pv_entry_t pv;
1633
1634 s = splvm();
1635 pv = get_pv_entry();
1636 pv->pv_va = va;
1637 pv->pv_pmap = pmap;
1638 pv->pv_ptem = mpte;
1639
1640 TAILQ_INSERT_TAIL(&pmap->pm_pvlist, pv, pv_plist);
The instruction pointer is always the same on these panics and is
almost invariably in a httpd process during the panic.
My interpretation is that it is actually failing on line 1635 of pmap.c
in get_pv_entry().
Here's the code for get_pv_entry():
get_pv_entry(void)
{
pv_entry_count++;
if (pv_entry_high_water &&
(pv_entry_count > pv_entry_high_water) &&
(pmap_pagedaemon_waken == 0)) {
pmap_pagedaemon_waken = 1;
wakeup (&vm_pages_needed);
}
return zalloci(pvzone);
}
Now, unless it does it somewhere else, there is no bounds
checking on pv_entry_count in that function. So, when the
pv_entry_count exceeds the limit on PV Entries (pv_entry_max as
defined in pmap_init2() in pmap.c), it just panics with a "page not
present" when it goes to process line 1636 because it is
impossible for a page to be present for a PV Entry with that
pv_entry_count number being greater than pv_entry_max as
defined in pmap_init2() in pmap.c.
I suppose, that if nobody is worried about this issue, then a quick
and dirty way to handle it would be to add bounds checking to
pv_entry_count in get_pv_entry() and if pv_entry_count is outside
the bounds, then produce a panic with a more informative
message. At least, with a useful panic, the problem would be
readily identified on other systems and you guys would have a
better opportunity to see how many other people run into this issue.
Now, that's my synopsis of the problem, though I'm still a newb
with regard to my understanding of the BSD memory management
system. Based on the information I've given you, do you still think
this panic was caused by running out of KVA/KVM? If I'm wrong,
I'd love to know it so I can revise my understanding of what is going
on to cause the panic.
For now, I've solved the problem by limiting the number of Apache
processes that are allowed to run based on my calculations of how
many PV Entries are required by each child process, but it's
painful to have all that RAM and not be able to put it to use
because of an issue in the memory management code that shows
up on large memory systems (>2GB). IMHO, Apache shouldn't be
able crash an OS before it ever starts using swap.
The only reason the problem doesn't show on systems with the
typical amounts of RAM (2GB or less) is that if those systems ran
Apache like we do, they'd spiral to a crash as swap usage
increased and eventually swap was completely filled.
Sincerely,
Andrew Kinney
President and
Chief Technology Officer
Advantagecom Networks, Inc.
http://www.advantagecom.net
More information about the freebsd-hackers
mailing list