Re: measuring swap partition speed

From: Mark Millard <marklmi_at_yahoo.com>
Date: Thu, 21 Dec 2023 18:36:10 UTC
void <void_at_f-m.fm> wrote on
Date: Thu, 21 Dec 2023 15:50:52 UTC :

> On Wed, Dec 20, 2023 at 07:48:14PM -0800, Mark Millard wrote:
> 
> ># swapoff /dev/label/growfs_swap
> ># dd if=/dev/urandom of=/dev/da0s2b bs=8k count=250000 conv=sync status=progress
> >^C478830592 bytes (479 MB, 457 MiB) transferred 22.001s, 22 MB/s
> >60557+0 records in
> >60556+0 records out
> >496074752 bytes transferred in 22.790754 secs (21766491 bytes/sec)
> 
> 22MB/s is usable, I think. In my context, I'd be satisfied with that.
> My context differs from yours slightly in that yours is SSD and mine
> is spinning rust.

I do not have access to spinning rust to test for comparison.
Others likely do.

> This is unusable:
> # dd if=/dev/urandom of=/dev/da0p4 bs=8k count=250000 conv=sync status=progress
> ^C11862016 bytes (12 MB, 11 MiB) transferred 40.063s, 296 kB/s

My point is that the performance seems to be strongly
tied to the media type contributions to the performance.
There is no general problem with partition swap based
performance. (But the type of test was set up to match
yours, not to be realistic for paging activity.)

The paging access pattern likely ends up doing lots
of seek activity, making for lots of accumulation of
latencies. It also is likely is a mix of read and write
activity. Mixes of small reads/writes to fairly random
palces tends to worsen performance compared to sequential.

Paging is not a good match to large sequential writes
as the only activity. Perhaps someone with fio background
can fully specify how to run a noticeably more realistic
benchmark for making swap perforamnce judgments, perhaps
monitored via gstat during its operation.

> because it's way too slow. Swap never gets fully reclaimed,
> thrashing happens, loads of other followon effects happen. 
> 
> The same partition formatted as ufs reports 113 MB/s. Multiple swap partitions
> have been tested, then converted to ufs. Results are the same.

This uses the file system caching and larger than 8K writes
--and only writes, not a realistic mixing of reads and
writes or the more random distribution of where the
reads and writes would be form/to. conv=sync does not prevent
the caching effect for sequential activity as I understand.

I suggest using:

# gstat -spod

to get an idea when the actual I/Os are like in each of
whatever relevant contexts are of interest (actual
operation with the swap performance issue and benchmarking).

So far as I can tell only you can provide such information,
as the issue is not readily repeatable by others.

From a broader view, actual-operation examples from "gstat
-spod" output might be of more general interest for your
type of context.

> There are no reported errors in smartctl. Long smartctl tests run monthly.
> 
> 5 Reallocated_Sector_Ct PO--CK 100 100 050 - 0
> 9 Power_On_Hours -O--CK 001 001 000 - 48992
> 196 Reallocated_Event_Count -O--CK 100 100 000 - 0
> 197 Current_Pending_Sector -O--CK 100 100 000 - 0
> 198 Offline_Uncorrectable ----CK 100 100 000 - 0
> 
> I can't find any hardware problem here. Possible workarounds, bearing in mind 
> I'm not versant in C so it's not like I can fix this myself in code:
> 
> 1. swap as swapfile and not partition [a]

(1) is subject to "trivial and unavoidable deadlocks". After
    suffering such, I avoid always avoid this form.

> 2. swap as nfs [b]

I've never used nfs for this but it likely has the same issue
as (1).

> 3. swapoff & swapon script running every minute [c]

If this ways works for bringing everything into RAM, it seems
to be an approximation of not having swap in the first place
and would be subbject to (4).

> 4. just turn all swap off and reboot after crashing (undesirable)

(I tend to have active SWAP partion(s) that has about
3.8*RAM space because of doing a form of high load
average "poudriere bulk" runs.)

[I have multiple SWAP partitions because of using the
same media in various machines that have widely different
amounts of RAM. I form a total active swap space that is
appropriate to the RAM present for the boot. Other than
that I'd use just one partition.]

> 5. use another OS that doesn't have this problem

You omit the alternative of using media for the swap/paging
space that avoids the problem. There are such around. Is
there a blocking issue for going the direction of also
having a separate swap media that has helpful characteristics?

I will note that the RPi4B shares its USB3 bandwidth across
the 2 USBC ports: they are not independent channels. Having
sustained I/O that competes for the bandwidth can be a
bottleneck issue of itself. A similar point can happen at
the media level when the swap space I/O and other I/O are
to the same media. (For spinning rust, that includes more
time spent seeking: additional latency.)

> [a] not tried yet, and i hope it works. Legacy info suggests swap as partition is usually
> faster than filesystem-based swap. But the reverse might be the case here.
> 
> [b] also not tried. This, I imagine, would be filesystem only (I'm unsure a zfs volume can
> be exported to look like a mountable partition to the client)
> 
> [c] https://github.com/Freaky/swapflush.git - usually works but maybe i need to run it every 
> minute instead of every five mins. For testing, this script was disabled.
> 
> Any additional suggestions on how to overcome this problem gratefully received.


===
Mark Millard
marklmi at yahoo.com