GPT vs MBR for swap devices

Warner Losh imp at bsdimp.com
Tue Jun 19 04:05:31 UTC 2018


On Mon, Jun 18, 2018, 9:44 PM bob prohaska <fbsd at www.zefox.net> wrote:

> On Mon, Jun 18, 2018 at 06:31:40PM -0700, Mark Millard wrote:
> > On 2018-Jun-18, at 5:55 PM, bob prohaska <fbsd at www.zefox.net> wrote:
> >
> > > On Mon, Jun 18, 2018 at 04:42:21PM -0700, Mark Millard wrote:
> > >>
> > >>
> > >> On 2018-Jun-18, at 4:04 PM, bob prohaska <fbsd at www.zefox.net>
> wrote:
> > >>
> > >>> On Sat, Jun 16, 2018 at 04:03:06PM -0700, Mark Millard wrote:
> > >>>>
> > >>>> Since the "multiple swap partitions across multiple
> > >>>> devices" context (my description) is what has problems,
> > >>>> it would be interesting to see swapinfo information
> > >>>> from around the time frame of the failures: how much is
> > >>>> used vs. available on each swap partition? Is only one
> > >>>> being (significantly) used? The small one (1 GiByte)?
> > >>>>
> > >>> There are some preliminary observations at
> > >>>
> > >>>
> http://www.zefox.net/~fbsd/rpi3/swaptests/newtests/1gbusbflash_1gbsdflash_swapinfo/1gbusbflash_1gbsdflash_swapinfo.log
> > >>>
> > >>> If you search for 09:44: (the time of the OOM kills) it looks like
> > >>> both swap partitions are equally used, but only 8% full.
> > >>>
> > >>> At this point I'm wondering if the gstat interval (presently 10
> seconds)
> > >>> might well be shortened and the ten second sleep eliminated. On the
> runs
> > >>> that succeed swap usage changes little in twenty seconds, but the
> failures
> > >>> seem to to culminate rather briskly.
> > >>
> > >> One thing I find interesting somewhat before the OOM activity is
> > >> the 12355 ms/w and 12318 ms/w on da0 and da0d that goes along
> > >> with having 46 or 33 L(q) and large %busy figures in the same
> > >> lines --and 0 w/s on every line:
> > >>
> > >> Mon Jun 18 09:42:05 PDT 2018
> > >> Device          1K-blocks     Used    Avail Capacity
> > >> /dev/da0b         1048576     3412  1045164     0%
> > >> /dev/mmcsd0s3b    1048576     3508  1045068     0%
> > >> Total             2097152     6920  2090232     0%
> > >> dT: 10.043s  w: 10.000s
> > >> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps
>  ms/d   %busy Name
> > >>    0      0      0      0    0.0      0      9   10.8      0      0
>   0.0    0.1  mmcsd0
> > >>   46      0      0      0    0.0      0     16  12355      0      0
>   0.0   85.9  da0
> > >>    0      0      0      0    0.0      0      9   10.8      0      0
>   0.0    0.1  mmcsd0s3
> > >>    0      0      0      0    0.0      0      9   10.8      0      0
>   0.0    0.1  mmcsd0s3a
> > >>   33      0      0      0    0.0      0     22  12318      0      0
>   0.0  114.1  da0d
> > >> Mon Jun 18 09:42:25 PDT 2018
> > >> Device          1K-blocks     Used    Avail Capacity
> > >> /dev/da0b         1048576     3412  1045164     0%
> > >> /dev/mmcsd0s3b    1048576     3508  1045068     0%
> > >> Total             2097152     6920  2090232     0%
> > >>
> > >>
> > >> The kBps figures for the writes are not very big above.
> > >>
> > >
> > > If it takes 12 seconds to write, I can understand the swapper getting
> impatient....
> > > However, the delay is on /usr, not swap.
> > >
> > > In the subsequent 1 GB USB flash-alone test case at
> > >
> http://www.zefox.net/~fbsd/rpi3/swaptests/newtests/1gbusbflash_swapinfo/1gbusbflash_swapinfo.log
> > > the worst-case seems to be at time 13:45:00
> > >
> > > dT: 13.298s  w: 10.000s
> > > L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps
>  ms/d   %busy Name
> > >    0      0      0      0    0.0      0      5    5.5      0      0
> 0.0    0.1  mmcsd0
> > >    9     84      0      0    0.0     84   1237   59.6      0      0
> 0.0   94.1  da0
> > >    0      0      0      0    0.0      0      5    5.5      0      0
> 0.0    0.1  mmcsd0s3
> > >    0      0      0      0    0.0      0      5    5.6      0      0
> 0.0    0.1  mmcsd0s3a
> > >    5     80      0      0    0.0     80   1235   47.2      0      0
> 0.0   94.1  da0b
> > >    4      0      0      0    0.0      0      1   88.1      0      0
> 0.0    0.7  da0d
> > > Mon Jun 18 13:45:00 PDT 2018
> > > Device          1K-blocks     Used    Avail Capacity
> > > /dev/da0b         1048576    22872  1025704     2%
> > >
> > > 1.2 MB/s writing to swap seems not too shabby, hardly reason to kill a
> process.
> >
> > That is kBps instead of ms/w.
> >
> > I see a ms/w (and ms/r) that is fairly large (but notably
> > smaller than the ms/w of over 12000):
> >
> > Mon Jun 18 13:12:58 PDT 2018
> > Device          1K-blocks     Used    Avail Capacity
> > /dev/da0b         1048576        0  1048576     0%
> > dT: 10.400s  w: 10.000s
> >  L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps
>  ms/d   %busy Name
> >     0      4      0      0    0.0      4     66    3.4      0      0
> 0.0    1.3  mmcsd0
> >     8     18      1     32   1991     17    938   2529      0      0
> 0.0   88.1  da0
> >     0      4      0      0    0.0      4     63    3.5      0      0
> 0.0    1.3  mmcsd0s3
> >     0      4      0      0    0.0      4     63    3.5      0      0
> 0.0    1.3  mmcsd0s3a
> >     6     11      1     32   1991     10    938   3207      0      0
> 0.0   94.7  da0d
> > Mon Jun 18 13:13:19 PDT 2018
> > Device          1K-blocks     Used    Avail Capacity
> > /dev/da0b         1048576        0  1048576     0%
> >
> >
> Yes, but again, it's on /usr, not  swap.


Doesn't really matter to the swap page. If you have an average latency of
12s averaged over 10s, your system performance is beyond what the system
can tolerate since there is a 30s timeout. .

One could argue that there are other
> write delays, not seen here, that do affect swap. To forestall that
> objection
> I'll get rid of the ten second sleep in the script when the present test
> run
> finishes.
>

Latencies north of 300ms are problematic. 12000ms would be unusable or
worse.

> Going in a different direction, I believe that you have
> > reported needing more than 1 GiByte of swap space so the
> > 1048576 "1K-blocks" would not be expected to be sufficient.
> > So the specific failing point may well be odd but the build
> > would not be expected to finish without an OOM for this
> > context if I understand right.
> >
> Yes, the actual swap requirement seems to be slightly over 1.4 GB
> at the peak based on other tests. I fully expected a failure, but
> at a much higher swap utilization.
>
>
> > > Thus far I'm baffled. Any suggestions?
> >
> > Can you get a failure without involving da0, the drive that is
> > sometimes showing these huge ms/w (and ms/r) figures? (This question
> > presumes having sufficient swap space, so, say, 1.5 GiByte or more
> > total.)
> >
> If you mean not using da0, no; it holds /usr. If you mean not swapping
> to da0, yes it's been done. Having 3 GB swap on microSD works.
> Which suggests an experiment: use 1 GB SD swap and 1.3 GB mechanical
> USB swap. That's easy to try.
>

I'll have to graph the numbers still. But these huge latencies will make it
non viable...



> Having the partition(s) each be sufficiently sized but for which
> > the total would not produce the notice for too large of a swap
> > space was my original "additional" suggestion. I still want to
> > see what such does as a variation of a failing context.
>
> I'm afraid you've lost me here. With two partitions, one USB and
> the other SD of one GB each OOM kills happen at 8% utilization,
> spread evenly across both. Does the size of the partition affect
> the speed of it? Capacity does not seem the problem.
>

OOM hap pop ens when we can't get memory fast enough. Having lots of swap
space is useless if it is too slow.

> it would seem to be a good idea to avoid da0 and its sometimes
> > large ms/w and /ms/r figures.
> >
>
> I think the next experiment will be to use 1 GB of SD swap and
> 1.3 GB of mechanical USB swap. We know the SD swap is fast enough,
> and we know the USB mechanical swap is fast enough. If that
> combination works, maybe the trouble is congestion on da0. If the combo
> fails as before I'll be tempted to think it's USB or the swapper.
>

My money is still on super crappy NAND creating such a terrible bottleneck
that we trigger OOM.

Warner

Thanks for reading!
>
>
> bob prohaska
> >
> >
> _______________________________________________
> freebsd-arm at freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-arm
> To unsubscribe, send any mail to "freebsd-arm-unsubscribe at freebsd.org"
>


More information about the freebsd-arm mailing list