i386: vm.pmap kernel local race condition

Sat Feb 16 18:25:33 UTC 2013

On Thu, Feb 14, 2013 at 7:55 AM, Eugene Grosbein <eugen at grosbein.pp.ru>wrote:

> Hi!
>
> I've got FreeBSD 8.3-STABLE/i386 server that can be reliably panicked
> using just 'squid -k rotatelog' command. It seems the system suffers
> from the problem described here:
>
> http://cxsecurity.com/issue/WLB-2010090156
>
> I could not find any FreeBSD Security Advisory containing a fix.
>
> My server has 4G physical RAM (about 3.2G available) and runs
> squid (about 110M VSS) with 500 ntlm_auth subprocesses.
> Lesser number of ntlm_auth sometimes results in squid crash
> as it sometimes has several hundreds requests per second to authorize
> and is intolerant to exhaustion of free ntlm_auth.
>
> "squid -k rotatelog" at midnight results in crash:
>
> Feb 14 00:03:00 irl savecore: reboot after panic: get_pv_entry: increase
> vm.pmap.shpgperproc
> Feb 14 00:03:00 irl savecore: writing core to vmcore.1
>
> Btw, I have coredump.
>
> vm.pmap.shpgperproc has default value (200) here, as well as m.v_free_min,
> vm.v_free_reserved, and vm.v_free_target and KVA_PAGES.
>
> These crashes are pretty regular
>
> # last|fgrep reboot
> reboot           ~                         Thu Feb 14 00:03
> reboot           ~                         Wed Feb 13 19:08
> reboot           ~                         Wed Feb 13 10:40
> reboot           ~                         Wed Feb 13 00:04
> reboot           ~                         Tue Feb 12 00:09
> reboot           ~                         Mon Feb 11 00:03
> reboot           ~                         Sun Feb 10 00:03
> reboot           ~                         Thu Feb  7 00:03
> reboot           ~                         Wed Feb  6 10:52
> reboot           ~                         Sun Feb  3 00:03
> reboot           ~                         Sat Feb  2 00:03
>
> May this be considered as security problem?
> Can it be fixed without switch to amd64?
> I have only remote access to this production server, no serial console.
>
>
Regardless of what that web site says, this is not really a race
condition.  Instead, you're exhausting a resource in the kernel because of
the characteristics of your workload.  The kernel tries to handle this
gracefully, but in extreme cases, the kernel can't keep up with the
demand.  Have you simply tried doing as the panic message suggests, i.e.,
increase vm.pmap.shpgperproc?  Alternatively, you can increase
vm.pmap.pv_entry_max to more directly accomplish the same.

That said, if possible, you should do as Adrian suggests and change your
Squid configuration to not use 500 helper processes.  That will allow a lot
more of your machine's physical memory to go to caching data rather
bookkeeping data structures in the kernel.

Regards,
Alan