32GB limit per swap device?

Matthew Dillon dillon at apollo.backplane.com
Tue Aug 23 07:21:01 UTC 2011


    Two additional pieces of information.

    The original limitation was more related to DEV_BSIZE calculations for
    the buf/bio, which is now 64-bits and thus not applicable, though you
    probably need some preemptive casts to ensure the multiplication is
    done in 64-bits.  There was also another intermediate calculation
    overflow in the swap radix-tree code which had to be fixed to be able
    to use the full range... I think w/simple casts.  I haven't looked it
    up but there should be a few commits in the DFly codebase that can
    be referenced.

    Second item:  The main physical memory use is not the radix tree bitmap
    code for the swap code, but instead the auxillary data structure used
    to store the swapblk information which is associated with the vm_object
    structure.  This structure contains a short array of swap block
    assignments (as a memory optimization to reduce header overhead) and
    it is these fields which you really want to keep 32-bits (unless you
    want the ~1MB per ~1GB of swap to become ~2MB per ~1GB of swap in
    physical memory overhead).  The block number is in page-sized chunks
    so the practical limit is still ~4TB, with a further caveat below.

    The further caveat is that the actual limitation for the radix tree
    is 0x40000000 blocks, which is 1/4 the full range or ~1TB, so the
    actual limitation for the (fixed) original radix tree code is ~1TB
    rather than ~4TB.  This restricted range is due to some shift << >>
    operators used in the radix tree code that I didn't want to make more
    complicated.

    So, my recommendation is to fix the intermediate calculations and keep
    the swapblk related blockno fields 32 bits.

    The preallocation for the vm_object's auxillary structure must be large
    enough to actually be able to fill up swap and assign all the swap blocks.
    This is what eats the physical memory (4 bytes per 4K = 1024x storage
    factor).  The radix tree bitmap itself winds up eating only around
    2 bits per swap block in total overhead.  So the auxillary structure is
    the main culprit.  You definitely want to keep those block number fields
    in the aux structure 32 bits.

    The practical limit of ~1TB of swap requires ~1GB of preallocated
    physical memory with a 32 bit block number field.  That would become
    ~2GB of preallocated memory if 64 bit block numbers were used instead,
    for no gain other than wasting physical memory.  Ok, nobody is likely
    to actually need that much swap but people might be surprised, there are
    a lot of modern-day uses for swap space that don't involve heavy paging
    of anonymous memory.

						-Matt


More information about the freebsd-stable mailing list