RPI3 swap experiments, was Re: GPT vs MBR for swap devices

Wed Jun 27 05:40:35 UTC 2018

On Tue, Jun 26, 2018 at 07:09:09PM -0700, Mark Millard wrote:
> 
> 
> On 2018-Jun-26, at 3:28 PM, bob prohaska <fbsd at www.zefox.net> wrote:
> 
> > On Tue, Jun 26, 2018 at 01:15:54PM -0700, Mark Millard wrote:
> >> On 2018-Jun-26, at 8:18 AM, bob prohaska <fbsd at www.zefox.net> wrote:
> >> 
> >>> On Tue, Jun 26, 2018 at 07:37:59AM -0700, Mark Millard wrote:
> >>>> 
> >>>> 
> >>>> . . .
> >>>> 
> >>>> As I remember, Bob P. Did reproduce drive errors even without
> >>>> the problem drive being used for swapping. This too suggests
> >>>> (A) as separate activity.
> >>>> 
> >>> Indeed, it is a requirement. If the suspect device is used for swapping
> >>> OOMA kills prevent the test from progressing to the point of failure.
> >>> 
> >> 
> >> Looking back at http://www.zefox.net/~fbsd/rpi3/swaptests/
> >> and information about /dev/da0 rive errors it does not
> >> appear that a combination with:
> >> 
> >> A) sufficient swap (> 1.5 GiByte total?) but no use of swap on
> >>   any partition on /dev/da0
> >> and:
> >> B) use of /dev/da0 for /usr/ and /var/
> >> and:
> >> C) Records from the console showing errors (or notes
> >>   indicating lack of such errors).
> >> 
> >> exists. So I was remembering incorrectly.
> >> 
> >> I'm not claiming such a combination is the best direction for
> >> the next tests, but absent such tests there is no
> >> compare/contrast to know if /dev/da0 would still get errors
> >> despite the system having sufficient swap present on other
> >> drives. Thus, I would not go so far as "is a requirement" on
> >> the evidence available.
> >> 
> > 
> > I just didn't bother to record successful runs. I'm logging one now.
> > 
> >> We do have evidence for the system having insufficient swap
> >> space: this context seems to have the current status "is
> >> sufficient but might not be necessary" for /dev/da0
> >> getting drive errors.
> >> 
> > Not sure I understand here. Basically there seem to be three cases:
> > Enough swap not on da0, -j4 buildworld completes.
> > Any swap on da0, -j4 buildworld is killed by OOMA
> > Not enough swap not on da0, -j4 buildworld crashes the machine eventually.
                    ^^^^^^^^^^
OK, here's my error. The third case should have been
"not enough swap on mmcsd0". 

> > 
> > Are there other combinations I've overlooked? The first two don't seem 
> > worth repeating, at least not often.
> 
> "buildworld completes with /dev/da0 errors" vs. "buildworld completes
> without /dev/da0 errors" (for: enough swap not on /dev/da0 with no
> swap on /dev/da0 ).
> 
> That is a little simplistic, as there can be multiple retries
> before FreeBSD gives up. Normal is no retries needed. Going
> from rare single retries to frequent multiple retries but no
> giving-up to it giving up sometimes is all abnormal as I
> understand. But there are degrees of abnormal.
> 
> And, yes, I have had past examples of significant drive reports
> during buildworld that let buildworld appear to complete. (Not
> that I trusted the result or the drive involved after such, at
> least as the drive was powered/connected at the time.)
> 
> For "any swap on da0" and "not enough swap not on da0" (with
> no swap on da0) I'd add to your descriptions: "with /dev/da0
> errors" (again simplistic).

The only case where I've seen crashes and /dev/da0 errors is with
insufficient swap on mmcsd0.  I've come to ignore OOMA kills as 
too familiar to be interesting. 
> 
> This goes along with my suggestion to split the /dev/da0
> error investigation from the investigations of OMMA behavior
> and crashing-the-machine: avoiding any confounding.
> 
>From what I've seen, OOMA isn't associated with da0 errors and crashes.
To see the latter, OOMA must be avoided.

Thanks for reading,

bob prohaska