32GB limit per swap device?
Kostik Belousov
kostikbel at gmail.com
Sat Aug 20 17:41:56 UTC 2011
On Sat, Aug 20, 2011 at 12:33:29PM -0500, Alan Cox wrote:
> On Thu, Aug 18, 2011 at 3:16 AM, Alexander V. Chernikov <melifaro at ipfw.ru>wrote:
>
> > On 10.08.2011 19:16, perryh at pluto.rain.com wrote:
> >
> >> Chuck Swiger<cswiger at mac.com> wrote:
> >>
> >> On Aug 9, 2011, at 7:26 AM, Daniel Kalchev wrote:
> >>>
> >>>> I am trying to set up 64GB partitions for swap for a system that
> >>>> has 64GB of RAM (with the idea to dump kernel core etc). But, on
> >>>> 8-stable as of today I get:
> >>>>
> >>>> WARNING: reducing size to maximum of 67108864 blocks per swap unit
> >>>>
> >>>> Is there workaround for this limitation?
> >>>>
> >>>
> > Another interesting question:
> >
> > swap pager operates in page blocks (PAGE_SIZE=4k on common arch).
> >
> > Block device size in passed to swaponsomething() in number of _disk_ blocks
> > (e.g. in DEV_BSIZE=512). After that, kernel b-lists (on top of which swap
> > pager is build) maximum objects check is enforced.
> >
> > The (possible) problem is that real object count we will operate on is not
> > the value passed to swaponsomething() since it is calculated in wrong units.
> >
> > we should check b-list limit on (X * DEV_BSIZE512 / PAGE_SIZE) value which
> > is rough (X / 8) so we should be able to address 32*8=256G.
> >
> > The code should look like this:
> >
> > Index: vm/swap_pager.c
> > ==============================**==============================**=======
> > --- vm/swap_pager.c (revision 223877)
> > +++ vm/swap_pager.c (working copy)
> > @@ -2129,6 +2129,15 @@ swaponsomething(struct vnode *vp, void *id, u_long
> > u_long mblocks;
> >
> > /*
> > + * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
> > + * First chop nblks off to page-align it, then convert.
> > + *
> > + * sw->sw_nblks is in page-sized chunks now too.
> > + */
> > + nblks &= ~(ctodb(1) - 1);
> > + nblks = dbtoc(nblks);
> > +
> > + /*
> >
> > * If we go beyond this, we get overflows in the radix
> > * tree bitmap code.
> > */
> > @@ -2138,14 +2147,6 @@ swaponsomething(struct vnode *vp, void *id, u_long
> > mblocks);
> > nblks = mblocks;
> > }
> > - /*
> > - * nblks is in DEV_BSIZE'd chunks, convert to PAGE_SIZE'd chunks.
> > - * First chop nblks off to page-align it, then convert.
> > - *
> > - * sw->sw_nblks is in page-sized chunks now too.
> > - */
> > - nblks &= ~(ctodb(1) - 1);
> > - nblks = dbtoc(nblks);
> >
> > sp = malloc(sizeof *sp, M_VMPGDATA, M_WAITOK | M_ZERO);
> > sp->sw_vp = vp;
> >
> >
> > (move pages recalculation before b-list check)
> >
> >
> > Can someone comment on this?
> >
> >
> I believe that you are correct. Have you tried testing this change on a
> large swap device?
I probably agree too, but I am in the process of re-reading the swap code,
and I do not quite believe in the limit.
When the initial code was committed, our daddr_t was 32bit, I checked
the RELENG_4 sources. Current code uses int64_t for daddr_t. My impression
right now is that we only utilize the low 32bits of daddr_t.
Esp. interesting looks the following typedef:
typedef uint32_t u_daddr_t; /* unsigned disk address */
which (correctly) means that typical mask (u_daddr_t)-1 is 0xffffffff.
I wonder whether we could just use full 64bit and de-facto remove the
limitation on the swap partition size.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20110820/1a75d9c9/attachment.pgp
More information about the freebsd-stable
mailing list