CFT: TRIM Consolodation on UFS/FFS filesystems

Mark Millard marklmi at yahoo.com
Wed Sep 5 16:53:37 UTC 2018


[I got some results but also provide notes about
context limitations for what can be tested.]

On 2018-Sep-5, at 12:06 AM, Mark Millard <marklmi at yahoo.com> wrote:

> On 2018-Sep-4, at 10:20 PM, Kirk McKusick <mckusick at mckusick.com> wrote:
> 
>> Thanks for the report. Do you have a sense/measurement that the
>> build times better/same/worse than without the consolidation?
> 
> 
> buildworld buildkernel is far from I/O bound overall (elapsed time)
> when built via clang when the llvm materials are being built in this
> context: CPU/RAM bound. (I watched gstat -pd and top -CawSores for
> evidence of this.) Also, swap/paging I/O would not involve
> consolidation. It would be odd for the new consolidation to make
> much of a difference for this activity. (gcc 4.2.1 based builds were
> more I/O bound in my experience.)
> 
> But installworld is far more I/O bound and does take something
> like 11 or 12 minutes in this context. (installkernel is also
> more I/O bound but takes far less time.)
> 
> I have a record of an updating installworld for
> clang-cortexA53-installworld-poud already with consolidation
> turned on . . .
> 
> Start time: 2018-09-04:12:48:14
> End   time: 2018-09-04:12:59:54 (so 11 min 40 sec or so elapsed)
> typescript log size: 8698790
> 
> So after the poudriere/pkg activity I could repeat the
> "-j4 installworld distrib-dirs distribution DB_FROM_SRC=1
> DESTDIR=/usr/obj/DESTDIRs/clang-cortexA53-installworld-poud"
> with the new consolidation turned off. (That would leave
> active the mmc/sd code's consolidation that Ian L.
> referenced.)
> 
> I could also try yet again but with trim disabled for the file
> system, giving a 3rd contrasting case.
> 
> Would these comparisons help?
> 
> 
> As for my poudriere-devel use, I previously used PARALLEL_JOBS=1 but
> did this activity with PARALLEL_JOBS=2 (both with ALLOW_MAKE_JOBS=yes ).
> PARALLEL_JOBS=2 makes comparing individual port-build times a problem
> because of competing use of the CPUs over the same time period. There
> also is the issue of how I/O bound each port's build is or is not
> for the context.
> 
> Using "gstat -pd" and "top -CawSores" to watch devel/gcc8 build
> indicates I/O %busy is usually < 2%, even < 1%, and much of the time
> is 0.0% or 0.1%. (This is during a "prev-gcc" bootstrap stage, no
> longer using clang.)  It would be odd for the new consolidation to
> make much of an overall difference here. A more I/O bound port build?
> 
> 
> Note: I mount with -o noatime in use.


There is the limitation of the Pine64+ 2GB to
at most 50 Mhz High Speed mode because of limiting
to 3.3V: it would take 1.8V for SDR50 or DDR50 or
SDR104 for an sdcard. This may make the measurements
not as interesting.

If FreeBSD gains e.MMC DDR52 support(via an adapter
to sdcard) at 3.3V for the Pine64+ 2GB (operating at
50 MHz, say) I could then test that. (Modern Linux
has such support for the Pine64+ 2GB.)

The below close figures may be specific to the SANDISK
Ultra 128 GB's with the application A1 class. I do not
have other sdcard alternatives around to test. This may
mix with the above "only HS mode" issue.


For:

# sysctl vfs.ffs.dotrimcons=0
vfs.ffs.dotrimcons: 1 -> 0

reinstalling from the same buildworld I got:

Start time: 2018-09-05:08:32:34
End   time: 2018-09-05:08:44:40 (so 12 min 6 sec or so elapsed)
typescript log size: 8682920

instead of the prior 11 min 40 sec or so elapsed (but that
was for an upgrade instead of the same buildworld content).

Retrying with:

# sysctl vfs.ffs.dotrimcons=1                                                                                                                                                                     vfs.ffs.dotrimcons: 0 -> 1

reinstalling from the same buildworld again I got:

Start time: 2018-09-05:08:54:10
End   time: 2018-09-05:09:06:10 (so 12 min 0 sec or so elapsed)
typescript log size: 8694405

On this scale of difference re-running multiple times
under the same setting could be called for to observe
the variability. Let me know if you want such.

I do not know if you want trim-disabled figures for this
context or not.



For reference for a fairly modern Linux and the Pine64+ 2GB
. . .

The linux is from:

https://dl.armbian.com/pine64/Ubuntu_bionic_dev_nightly.7z

which is:

nightly mainline kernel master branch 4.17.y or 4.18.y

It shows as:

# uname -ap
Linux pine64 4.18.0-rc4-sunxi64 #220 SMP Sun Jul 15 14:16:31 UTC 2018 aarch64 aarch64 aarch64 GNU/Linux


For the sdcard from the SDSQXBG-128G-GN6MA:
# cat /sys/kernel/debug/mmc0/ios
clock:          50000000 Hz
actual clock:   50000000 Hz
vdd:            21 (3.3 ~ 3.4 V)
bus mode:       2 (push-pull)
chip select:    0 (don't care)
power mode:     2 (on)
bus width:      2 (4 bits)
timing spec:    2 (sd high-speed)
signal voltage: 0 (3.30 V)
driver type:    0 (driver type B)

# hdparm -t /dev/mmcblk0

/dev/mmcblk0:
Timing buffered disk reads:  70 MB in  3.07 seconds =  22.80 MB/sec

(as proof of performance basically matching HS).

So basically matching what FreeBSD does.


For reference the e.MMC on an adapter used in the
Pine64+ 2GB for this Linux:

# cat /sys/kernel/debug/mmc0/ios
clock:          52000000 Hz
actual clock:   50000000 Hz
vdd:            21 (3.3 ~ 3.4 V)
bus mode:       2 (push-pull)
chip select:    0 (don't care)
power mode:     2 (on)
bus width:      2 (4 bits)
timing spec:    8 (mmc DDR52)
signal voltage: 0 (3.30 V)
driver type:    0 (driver type B)

# hdparm -t /dev/mmcblk0

/dev/mmcblk0:
Timing buffered disk reads: 134 MB in  3.04 seconds =  44.03 MB/sec

(as proof of performance basically matching mmc DDR52 being in use).

FreeBSD's head does not support this. In fact it fails to boot
instead of using some slower mmc mode with 3.3V.

===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)



More information about the freebsd-fs mailing list