Re: port binary dumping core on recent head in poudriere [tmpfs corruptions involving blocks of zeros that should not be all zeros]

From: Mark Millard <marklmi_at_yahoo.com>
Date: Tue, 26 Nov 2024 13:21:31 UTC

On Nov 26, 2024, at 04:58, Dimitry Andric <dim@FreeBSD.org> wrote:

> On 26 Nov 2024, at 13:32, Dimitry Andric <dim@FreeBSD.org> wrote:
>> 
>> On 26 Nov 2024, at 11:19, Dag-Erling Smørgrav <des@FreeBSD.org> wrote:
>>> 
>>> Mark Millard <marklmi@yahoo.com> writes:
>>>> From inside a bulk -i where I did a manual make command
>>>> after it built and installed libsass.so.1.0.0 . The
>>>> manual make produced a /wrkdirs/ :
>>>> [...]
>>>> So the original creation looks okay. But . . .
>>>> [...]
>>>> So: The later, staged copy is a bad copy. Both are in the
>>>> tmpfs. So copying to the staging area makes a corrupted
>>>> copy inside the same tmpfs. After that, further copies of
>>>> staging's bad copy can be expected to be messed up.
>>> 
>>> This and the fact that it happens on 14 and 15 but not on 13 strongly
>>> suggests an issue wth `copy_file_range(2)`, since `install(1)` in 14 and
>>> 15 (but not in 13) now uses `copy_file_range(2)` if at all possible.
>>> 
>>> My educated guess is that hole detection doesn't work reliably for files
>>> that have had holes filled while memory-mapped, so `copy_file_range(2)`
>>> thinks there is a hole where there isn't one and skips some of the data
>>> when `install(1)` uses it to copy the library from `${WRKSRC}` to
>>> `${STAGEDIR}`.  This may or may not be specific to tmpfs.
>>> 
>>> You may want to try applying the attached patch to your FreeBSD 14 and
>>> 15 jails.  It prevents `cp(1)` and `install(1)` from trying to use
>>> `copy_file_range(2)`.
>> 
>> Yes, tmpfs is indeed the culprit (or at least involved). I have had USE_TMPFS=localbase in my poudriere.conf for a long time, since otherwise my build machine would run out of memory very quickly,

Use of TMPFS_BLACKLIST and TMPFS_BLACKLIST_TMPDIR can allow the use of
USE_TMPFS=all in many contexts. I'll later show my list that tries to
exclude most everything using more than 7 or so GiBytes of tmpfs for
the builder.

If nothing else, it can help have a context for testing for the failure
at hand for fairly general builds, including "bulk -a" .

>> so I didn't encounter any issues.
>> 
>> Now I changed it to USE_TMPFS=yes, rebuilt only textproc/libsass and textproc/sassc, and then after reinstalling those packages:
>> 
>> $ /usr/local/bin/sassc
>> Segmentation fault
> 
> And after applying Dag-Erling's patch to disable copy_file_range for cp and install, it works correctly again.

For reference (a very long line in its original form, noted in
case something splits the line):

TMPFS_BLACKLIST="*-emacs_devel *-emacs_devel_nox *-emacs_nox *-gcc14 *-rust-bootstrap 0ad RStudio aarch64-none-elf-gcc afni alliance anki apache-openoffice apache-openoffice-devel arm-none-eabi-gcc binutils biostar-tools blender boost-libs chezmoi chromium chrono-physics-simulation-engine clickhouse cmake-core code_saturne deno diaspora digikam dotnet dune-common dune-localfunctions dynare eclipse eksctl electron[1-9][0-9] ess ess-emacs_canna firefox firefox-esr foundationdb fr-aster freebsd-gcc14 gcc-arm-embedded gcc-msp430-ti-toolchain gcc14 gcc1[45]-devel gdb geant4 ghc ghc810 ghc92 ghc94 ghemical giacxcas grafana grafana-loki grafana9 gretl gstreamer1-plugins-rust heyoka hs-cardano-db-sync intel-graphics-compiler-llvm1[4321] iridium-browser julia kde5 kicad kicad-devel kicad-doc kicad-library-packages3d* kosmorro kstars libghemical libint2-psi4 libreoffice librewolf librsvg2-rust libva-intel-media-driver llvm-devel llvm1[98764321] mesa-dri mongodb[4-9][0-9] mpqc nerd-fonts nextcloudclient nextpnr octave octave-forge octave-forge-bim octave-forge-msh octave-forge-sec*d octave-forge-sole onlyoffice-documentserver paraview piglit py39-orange3-single py39-pytorch pydio-cells qemu qemu-devel qemu-nox11 qemu7 qemu7-nox11 qgis qgis-ltr qt*-webengine qt[56]*-webengine qt[56]-tools quantum-espresso-pseudopotentials ringrtc rust rust-nightly signal-desktop simpleitk telegraf tex-dvipdfmx tex-luatex tex-xetex texlive-docs texlive-full thunderbird tor-browser trilinos trivy ttk ungoogled-chromium vault vaultnextcloudclient virtualbox-ose virtualbox-ose-legacy virtualbox-ose-nox11 vscode vuls wasi-compiler-rt-* wasi-libcxx* webkit2-gtk3 wx30-gtk3 yazi ztop"


===
Mark Millard
marklmi at yahoo.com