Re: Python2.7 seemingly stuck on RPI3

From: Mark Millard via freebsd-arm <freebsd-arm_at_freebsd.org>
Date: Sun, 12 Sep 2021 16:10:02 UTC
[This is a resend now that I'm again subscribed to the
freebsd-arm list.]

From: bob prohaska <fbsd_at_www.zefox.net> 
Date: Sat, 11 Sep 2021 12:34:40 -0700 :

> A present attempt to compile www/chromium on a Pi3 using
> a single make job has gotten stuck in a curious way: 
> 
> Python2.7 appears to be stuck, or nearly stuck, reading
> swap. The machine isn't out of swap, in fact swap isn't
> even very busy, around 85% for the hard disk partition 
> and 15% for the microSD partition. Queue lengths are 
> small and kBps close to 1000 for both. 
> 
> 
> Here's a sample of disk activity:
> 
> procs     memory       page                      disks     faults       cpu
> r b w     avm     fre  flt  re  pi  po    fr   sr mm0 da0   in   sy   cs us sy id
> 0  0 13 2560404   57968  2546 103  83  35  2505 6020   0   0 20952   862  6525 19  4 78
> dT: 10.009s  w: 10.000s
> L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w    d/s   kBps   ms/d   %busy Name
>    0    147    147    944    0.9      0      0    0.0      0      0    0.0   13.6  mmcsd0
>    0    147    147    944    0.9      0      0    0.0      0      0    0.0   13.8  mmcsd0s2
>    1    144    141    878    6.6      2     76   31.8      0      0    0.0   85.9  da0
>    0    147    147    944    0.9      0      0    0.0      0      0    0.0   13.8  mmcsd0s2b
>    1    143    141    878    6.6      2     76   31.8      0      0    0.0   86.0  da0s2
>    0      2      0      0    0.0      2     76   31.8      0      0    0.0    0.6  da0s2a
>    1    141    141    878    6.6      0      0    0.0      0      0    0.0   86.0  da0s2b

From what is presented, I'd guess that da0s2b is on spinning rust.
My expectation is that da0 is spending much of its time
seeking and the system does not have much to do during the
seek activity. So the sum of the latencies is adding significant
time for any given amount of progress. 

Be careful of interpreting %busy as I understand. But the 86.0
or so figures suggest not to expect getting much more. This all
fits with the mmcsd0s2b activity for about the same use looking
like could go faster. Note the 0.9 ms/r for mmcsd0 vs. the
6.6 ms/r for da0 as well as the < 15 figures for %busy for mmcsd0.

Splitting the swapping/paging load across wildly mismatched media
bottlenecks on the slower media for the interlaced accesses from
what I can tell. Such a mismatch undoes any advantage from dual
channels from what I can tell.

Seek time is one of the reasons that I avoid spinning rust for
machines that I do builds on (and more generally then that, but
not universally). Fragmentation is less of an issue on the kinds
of media that I use.

For small arm boards I tend to use media that supports both
USB2 and USB3 use, for example staying within power limits
in each context but able to be fairly fast for USB3. I have
access to Samsung Portable SSD T7 Touch 1TB's for such use
in modern times. (I've not switched all the contexts over
yet. The older type of media that I'd access to is not always
purchasable these days --and is slower for making duplications
as backups or the start of a small variations.)

I've also been using T7's as alternate external boot media
for bigger machines, such as being able to boot and operate any
of: HoneyComb, MACCHIATObin Double Shot, RPI4B (8 GiByte) via
the same external media.

I make no claim that the T7's properties are unique for such
issues. The T7's just happen to be what I've used. Handling
the power issue as I want eliminates many products for me.
(I avoid USB hubs when I can for the small arm boards.)


> Sat Sep 11 12:01:49 PDT 2021
> Device          1K-blocks     Used    Avail Capacity
> /dev/da0s2b       1843200   882800   960400    48%
> /dev/mmcsd0s2b    1843200   880600   962600    48%
> Total             3686400  1763400  1923000    48%
> 
> Here's a sample of top output
> 
> 
> last pid: 41683;  load averages:  0.04,  0.13,  0.15                                                      up 15+17:47:00  12:02:29
> 48 processes:  1 running, 47 sleeping
> CPU:  0.1% user,  0.0% nice,  0.7% system,  0.7% interrupt, 98.5% idle
> Mem: 429M Active, 23M Inact, 182M Laundry, 222M Wired, 87M Buf, 44M Free
> Swap: 3600M Total, 1711M Used, 1889M Free, 47% Inuse, 3784K In
> 
>  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
> 24726 root          1  20    0  1566M   627M swread   1   7:36   1.83% python2.7
> 1074 bob           1  20    0    14M  1100K CPU0     0  39:03   0.14% top
>  881 root          1  20    0    12M   268K select   0   6:49   0.02% powerd
> 1069 bob           1  20    0    20M   684K select   0   3:28   0.02% sshd
>  942 root          1  20    0    20M   644K select   3   3:18   0.00% sshd
> 24504 root          1  41    0   370M   356K select   2   1:17   0.00% ninja
>  945 root          1  20    0    17M   972K select   2   1:04   0.00% sendmail
>  827 root          1  20    0    13M   616K select   2   1:03   0.00% syslogd
> 
> Admittedly, 1700M of swap is a lot in use, but in the past the machine
> was able to work through much higher swap usage. Now it seems well and
> truly stuck, though still reasonably responisve to keyboard input.
> It's unclear to me if this is my error or something else.


===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)