RPI3 swap experiments, was Re: GPT vs MBR for swap devices

Wed Jun 27 06:30:58 UTC 2018

On 2018-Jun-26, at 10:40 PM, bob prohaska <fbsd at www.zefox.net> wrote:

> On Tue, Jun 26, 2018 at 07:09:09PM -0700, Mark Millard wrote:
>> 
>> 
>> On 2018-Jun-26, at 3:28 PM, bob prohaska <fbsd at www.zefox.net> wrote:
>> 
>>> On Tue, Jun 26, 2018 at 01:15:54PM -0700, Mark Millard wrote:
>>>> On 2018-Jun-26, at 8:18 AM, bob prohaska <fbsd at www.zefox.net> wrote:
>>>> 
>>>>> On Tue, Jun 26, 2018 at 07:37:59AM -0700, Mark Millard wrote:
>>>>>> 
>>>>>> 
>>>>>> . . .
>>>>>> 
>>>>>> As I remember, Bob P. Did reproduce drive errors even without
>>>>>> the problem drive being used for swapping. This too suggests
>>>>>> (A) as separate activity.
>>>>>> 
>>>>> Indeed, it is a requirement. If the suspect device is used for swapping
>>>>> OOMA kills prevent the test from progressing to the point of failure.
>>>>> 
>>>> 
>>>> Looking back at http://www.zefox.net/~fbsd/rpi3/swaptests/
>>>> and information about /dev/da0 rive errors it does not
>>>> appear that a combination with:
>>>> 
>>>> A) sufficient swap (> 1.5 GiByte total?) but no use of swap on
>>>>  any partition on /dev/da0
>>>> and:
>>>> B) use of /dev/da0 for /usr/ and /var/
>>>> and:
>>>> C) Records from the console showing errors (or notes
>>>>  indicating lack of such errors).
>>>> 
>>>> exists. So I was remembering incorrectly.
>>>> 
>>>> I'm not claiming such a combination is the best direction for
>>>> the next tests, but absent such tests there is no
>>>> compare/contrast to know if /dev/da0 would still get errors
>>>> despite the system having sufficient swap present on other
>>>> drives. Thus, I would not go so far as "is a requirement" on
>>>> the evidence available.
>>>> 
>>> 
>>> I just didn't bother to record successful runs. I'm logging one now.
>>> 
>>>> We do have evidence for the system having insufficient swap
>>>> space: this context seems to have the current status "is
>>>> sufficient but might not be necessary" for /dev/da0
>>>> getting drive errors.
>>>> 
>>> Not sure I understand here. Basically there seem to be three cases:
>>> Enough swap not on da0, -j4 buildworld completes.
>>> Any swap on da0, -j4 buildworld is killed by OOMA
>>> Not enough swap not on da0, -j4 buildworld crashes the machine eventually.
>                    ^^^^^^^^^^
> OK, here's my error. The third case should have been
> "not enough swap on mmcsd0". 
> 
> 
>>> 
>>> Are there other combinations I've overlooked? The first two don't seem 
>>> worth repeating, at least not often.
>> 
>> "buildworld completes with /dev/da0 errors" vs. "buildworld completes
>> without /dev/da0 errors" (for: enough swap not on /dev/da0 with no
>> swap on /dev/da0 ).
>> 
>> That is a little simplistic, as there can be multiple retries
>> before FreeBSD gives up. Normal is no retries needed. Going
>> from rare single retries to frequent multiple retries but no
>> giving-up to it giving up sometimes is all abnormal as I
>> understand. But there are degrees of abnormal.
>> 
>> And, yes, I have had past examples of significant drive reports
>> during buildworld that let buildworld appear to complete. (Not
>> that I trusted the result or the drive involved after such, at
>> least as the drive was powered/connected at the time.)
>> 
>> For "any swap on da0" and "not enough swap not on da0" (with
>> no swap on da0) I'd add to your descriptions: "with /dev/da0
>> errors" (again simplistic).
> 
> The only case where I've seen crashes and /dev/da0 errors is with
> insufficient swap on mmcsd0.  I've come to ignore OOMA kills as 
> too familiar to be interesting. 

"crashes and /dev/da0 errors":

A) Any examples of /dev/da0 errors without crashes?
B) Any examples of crashes without /dev/da0 errors?
C) All examples that do either also does the other
   (so both always go together)?

(I've having trouble parsing a specific meaning for
the reference. I did not go back trough all the logs
again to identify the combinations recorded.)

For (A), have you tried any examples of:

sufficient swap on mmcsd0 (or other such) with no swap
on da0 (but /usr and /var on /dev/da0)?

If yes, did you check on if there were /dev/da0 errors
logged? What, if any, /dev/da0 errors where logged?
None?

For (B), have you tried any examples of:

insufficient swap on (say) mmcsd0 and no use of the
/dev/da0 drive that has reported errors at all,
/usr/ and /var not on mmcsd0 (or whatever was used
for swap) either? Did some drive end up reporting
errors? Which? Did the system still crash as well?

Have such test-context combinations been tried?

Without any logs to look at for such alternatives, I
can not try to compare/contrast such to the others
examples.

>> 
>> This goes along with my suggestion to split the /dev/da0
>> error investigation from the investigations of OMMA behavior
>> and crashing-the-machine: avoiding any confounding.
>> 
> From what I've seen, OOMA isn't associated with da0 errors and crashes.
> To see the latter, OOMA must be avoided.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)