RPI3 swap experiments, was Re: GPT vs MBR for swap devices

Tue Jun 26 06:25:38 UTC 2018

On Mon, Jun 25, 2018 at 11:24 PM, bob prohaska <fbsd at www.zefox.net> wrote:

> On Sun, Jun 24, 2018 at 09:22:38PM -0700, Mark Millard wrote:
> > On 2018-Jun-24, at 4:10 PM, bob prohaska <fbsd at www.zefox.net> wrote:
> >
> >
> > > I've tried to replicate  the RPi3  "run out of swap" experiment after
> > > updating source, kernel and world to r335576. Roughly the same things
> happen:
> > > Errors flood the console, when swap usage goes a bit over 80% the
> machine becomes
> > > unresponsive.  No sign of the OOM assassin.
> > >
> > > However, -j4 buildworld got all the way to building libraries. With
> r334939 it
> > > always stopped in cross tools. That seems like a significant
> improvement
> > > in swap usage efficiency. Is this to be expected?
> > >
> >
> > >From the log file:
> >
> > http://www.zefox.net/~fbsd/rpi3/swaptests/r335576/
> 1gbsdflash/buildworld.log
> >
> > is the text:
> >
> > --- buildworld ---
> > make[1]: "/usr/src/Makefile.inc1" line 299: SYSTEM_COMPILER: Determined
> that CC=cc matches the source tree.  Not bootstrapping a cross-compiler.
> > make[1]: "/usr/src/Makefile.inc1" line 304: SYSTEM_LINKER: Determined
> that LD=ld matches the source tree.  Not bootstrapping a cross-linker.
> >
> > So the cross compiler and cross linker were not built: the existing
> > llvm files were used.
> >
> Ahh, so it wasn't a massive performance increase.... too bad!
> >
> > > What details were captured can be seen at
> > > http://www.zefox.net/~fbsd/rpi3/swaptests/r335576/1gbsdflash/
> > > in case they're of interest.
> >
> >
> > You are still using the drive that gets the errors ( /dev/da0 ),
> > even if it is not being used for swapping.
> >
> > http://www.zefox.net/~fbsd/rpi3/swaptests/r335576/1gbsdflash/console
> >
> > shows:
> >
> > _vfs_done():da0d[WRITE(offset=51819347968, length=131072)]error = 5
> > g_vfs_done():da0d[WRITE(offset=51819479040, length=28672)]error = 5
> > g_vfs_done():da0d[READ(offset=59586936832, length=32768)]error = 5
> > g_vfs_done():vm_fault: pager read error, pid 823 (tcsh)
>

The device is broken if you get this. Period. I don't know if it is
hardware, or software, but it is not a reliable storage device. Until
that's fixed, you'll continue to have a terrible experience with it.

> Yes, I'm still using that same device. The errors attributed to /dev/da0
> were reported nearly two hours after the system first reported distress.
> That makes it  hard to believe the errors caused the problem.
>

da0 is broken is what these errors mean. Broken. Not a little under the
weather, or pining for the fjords, but an ex parrot. Errr, a broken thumb
drive, a broken driver, or a drive that's missing a quirk. Trying to assign
which partition is broken misses the bigger picture: you shouldn't see
error rates like this. That means something is wrong. I presume the drive
isn't defective (though that should be ruled out by swapping in a similar
thumb drive), which leaves missing quirk (the umass driver is doing
something to make it go catatonic which we may have quirks for since you
can't probe it), umass has some kind of bug, or the usb bridge on the rpi
goes out to lunch.

Sorry to sound so harsh, but the data has been consistent on this for
everything you've reported: it works for a while, then we get a bunch of
errors then a reboot. We need to start narrowing down which of these three
broad classes of root causes it is. I'd rank actual bad thumbdrive last on
the list. It's a tossup for me between missing quirk and a bug in the rpi
usb driver that manifests itself only under heavy load. IIRC, you said one
of rpi2/3 works and the other doesn't, which would suggest a usb bridge
driver problem...

Warner

Warner