i386 PAE kernel works fine on 10-stable

Alfred Perlstein bright at mu.org
Tue Dec 16 00:33:05 UTC 2014


> On Dec 15, 2014, at 3:42 PM, Peter Wemm <peter at wemm.org> wrote:
> 
>> On Sunday, December 14, 2014 10:53:14 AM Alfred Perlstein wrote:
>>> On Dec 14, 2014, at 10:12 AM, Ian Lepore wrote:
>>>> On Sun, 2014-12-14 at 10:09 -0800, Alfred Perlstein wrote:
>>>>> On Dec 14, 2014, at 9:47 AM, Ian Lepore wrote:
>>>>> This is an out of the blue FYI post to let people know that despite all
>>>>> the misinformation you'll run across if you search for information on
>>>>> FreeBSD PAE support, it (still) works just fine.  I've been using it
>>>>> (for reasons related to our build system and products at $work) since
>>>>> 2006, and I can say unequivocally that it works fine on 6.x, 8.x, and
>>>>> now 10.x (and presumably on the odd-numbered releases too but I've never
>>>>> tried those).
>>>>> 
>>>>> In my most recent testing with 10-stable, I found it was compatible with
>>>>> drm2 and radeonkms drivers and I was able to run Xorg and gnome just
>>>>> fine.  All my devices, and apps, and even the linuxulator worked just
>>>>> fine.
>>>>> 
>>>>> One thing that changed somewhere between 8.4 and 10.1 is that I had to
>>>>> add a kernel tuning option to my kernel config:
>>>>> 
>>>>> option  KVA_PAGES=768        # Default is 512
>>>>> 
>>>>> I suspect that the most frequent use of PAE is on laptops that have 4gb
>>>>> and the default tuning is adequate for that.  My desktop machine has
>>>>> 12gb and I needed to bump up that value to avoid errors related to being
>>>>> unable to create new kernel stacks.
>>>> 
>>>> There already is a #define that is bifurcated based on PAE in pmap.h:
>>>> 
>>>> #ifndef KVA_PAGES
>>>> #ifdef PAE
>>>> #define KVA_PAGES       512
>>>> #else
>>>> #define KVA_PAGES       256
>>>> #endif
>>>> #endif
>>>> 
>>>> Do you think it will harm things to apply your suggested default to this
>>>> file?>
>>> I would have to defer to someone who actually understands just what that
>>> parm is tuning.  It was purely speculation on my part that the current
>>> default is adequate for less memory than I have, and I don't know what
>>> that downside might be for setting it too high.
>> 
>> KVA pages is the amount of pages reserved for kernel address space:
>> 
>> * Size of Kernel address space.  This is the number of page table pages
>> * (4MB each) to use for the kernel.  256 pages == 1 Gigabyte.
>> * This **MUST** be a multiple of 4 (eg: 252, 256, 260, etc).
>> * For PAE, the page table page unit size is 2MB.  This means that 512 pages
>> * is 1 Gigabyte.  Double everything.  It must be a multiple of 8 for PAE.
>> 
>> It appears that our default for PAE leaves 1GB for kernel address to play
>> with?  That's an interesting default.  Wonder if it really makes sense for
>> PAE since the assumption is that you'll have >4GB ram in the box, wiring
>> down 1.5GB for kernel would seem to make sense…  Probably make sense to ask
>> Peter or Alan on this.
> 
> It's always been a 1GB/3GB split.  It was never a problem until certain 
> scaling defaults were changed to scale solely based on physical ram without 
> regard for kva limits.

Hmm the original patch I gave for that only changed scaling for machines with 64 bit pointers. Why was it that the 32 bit stuff was made to change?

> 
> With the current settings and layout of the userland address space between the 
> zero-memory hole, the reservation for maxdsiz, followed by the ld-elf.so.1 
> space and shared libraries, there's just enough room to mmap a 2GB file and 
> have a tiny bit of wiggle room left.
> 
> With changing the kernel/user split to 1.5/2.5 then userland is more 
> restricted and is typically around the 1.8/1.9GB range.
> 
> You can get a large memory PAE system to boot with default settings by 
> seriously scaling things down like kern.maxusers, mbufs limits, etc.
> 
> However, we have run ref11-i386 and ref10-i386 in the cluster for 18+ months 
> with a 1.5/2.5 split and even then we've run out of kva and we've hit a few 
> pmap panics and things that appear to be fallout of bounce buffer problems.
> 
> While yes, you can make it work, I am personally not convinced that it is 
> reliable.
> 
> My last i386 PAE machine died earlier this year with a busted scsi backplane 
> for the drives.  It went to the great server crusher.

Oh I made dumb assumption that pae was 4/4 basically not split. Ok thanks. 

> 
>> Also wondering how bad it would be to make these tunables, I see they
>> trickle down quite a bit into the system, hopefully not defining some
>> static arrays, but I haven't dived down that far.
> 
> They cause extensive compile time macro expansion variations that are exported 
> to assembler code via genassym.  KVA_PAGES is not a good candidate for a 
> runtime tunable unless you like the pain of i386/locore.s and friends.

Ouch. Ok. 

-Alfred. 


More information about the freebsd-stable mailing list