kern_execve using vm_page_zero_invalid but not vm_page_set_validclean to load /sbin/init ?
Mark Millard
marklmi at yahoo.com
Wed Jun 12 19:13:11 UTC 2019
[Looks to me like the ->valid mask only is used for the
last page of the /sbin/init file, not based on the size
and alignment of the data requested for the PT_LOAD.]
On 2019-Jun-11, at 21:53, Mark Millard <marklmi at yahoo.com> wrote:
> [The garbage after .got up to the page boundary is
> .comment section strings. The context here is
> targeting 32-bit powerpc via system-clang-8 and
> devel/powerpc64-binutils for buildworld and
> buildkernel . ]
>
> On 2019-Jun-11, at 19:55, Mark Millard <marklmi at yahoo.com> wrote:
>
>> [I have confirmed .sbss not being zero'd out and environ
>> thereby starting out non-zero (garbage): a
>> debug.minidump=0 style dump.]
>>
>>> On 2019-Jun-10, at 16:19, Mark Millard <marklmi at yahoo.com> wrote:
>>>
>>> . . . (omitted) . . .
>>
>> I used debug.minidump=0 in /boot/loader.conf for
>> cusing a dump for the crash and a libkvm modified
>> enough for my working boot environment to allow me
>> to examine the the memory-image bytes of such a dump,
>> with libkvm used via /usr/local/bin/kgdb . (No support
>> of automatically translating user-space addresses
>> or other such.)
>>
>> For the clang based debug buildworld and debug buildkernel
>> context with /sbin/init having:
>>
>> [16] .got PROGBITS 01956ccc 146ccc 000010 04 WAX 0 0 4
>> [17] .sbss NOBITS 01956cdc 146cdc 0000b0 00 WA 0 0 4
>> [18] .bss NOBITS 01956dc0 146cdc 02ee28 00 WA 0 0 64
>>
>> I confirmed that .sbss in /sbin/init's address space
>> is not zeroed (so environ is not assigned by handle_argv ).
>> I also confirmed that _start was given a good env value
>> (in %r5) based on where the value was stored on the
>> stack. It is just that the value was not used.
>>
>> The detailed obvious-failure point (crash) can change based
>> on the garbage in the .sbss and, for the build that I used
>> this time, that happened in __je_arean_malloc_hard instead
>> of before _init_tls called _libc_allocate_tls . (I traced
>> the call chain in the dump.)
>>
>>
>> From what I've seen in the dump there seem to be special
>> uses of some values (that also have normal uses, of
>> course):
>>
>> 0xfa5005af: as yet invalid page content.
>> 0x1c000020: as yet unassigned user-space-stack memory for /sbin/init.
>>
>> These are the same locations that I previously reported as
>> showing up in the DSI read trap reports for /sbin/init failing.
>> The specific build here failed with a different value.
>>
>> For reference relative to libkvm:
>>
>> # svnlite diff /usr/src/lib/libkvm/
>> Index: /usr/src/lib/libkvm/kvm_powerpc.c
>> ===================================================================
>> --- /usr/src/lib/libkvm/kvm_powerpc.c (revision 347549)
>> +++ /usr/src/lib/libkvm/kvm_powerpc.c (working copy)
>> @@ -211,6 +211,53 @@
>> if (be32toh(vm->ph->p_paddr) == 0xffffffff)
>> return ((int)powerpc_va2off(kd, va, ofs));
>>
>> + // HACK in something for what I observe in
>> + // a debug.minidump=0 vmcore.* for 32-bit powerpc
>> + //
>> + if ( be32toh(vm->ph->p_vaddr) == 0xffffffff
>> + && be32toh(vm->ph->p_paddr) == 0
>> + && be16toh(vm->eh->e_phnum) == 1
>> + ) {
>> + // Presumes p_memsz is either unsigned
>> + // 32-bit or is 64-bit, same for va .
>> +
>> + if (be32toh(vm->ph->p_memsz) <= va)
>> + return 0; // Like powerpc_va2off
>> +
>> + // If ofs was (signed) 32-bit there
>> + // would be a problem for sufficiently
>> + // large postive memsz's and va's
>> + // near the end --because of p_offset
>> + // and dmphdrsz causing overflow/wrapping
>> + // for some large va values.
>> + // Presumes 64-bit ofs for such cases.
>> + // Also presumes dmphdrsz+p_offset
>> + // is non-negative so that small
>> + // non-negative va values have no
>> + // problems with ofs going negative.
>> +
>> + *ofs = vm->dmphdrsz
>> + + be32toh(vm->ph->p_offset)
>> + + va;
>> +
>> + // The normal return value overflows/wraps
>> + // for p_memsz == 0x80000000u when va == 0 .
>> + // Avoid this by depending on calling code's
>> + // loop for sufficiently large cases.
>> + // This code presumes p_memsz/2 <= MAX_INT .
>> + // 32-bit powerpc FreeBSD does not allow
>> + // using more than 2 GiBytes of RAM but
>> + // does allow using 2 GiBytes on 64-bit
>> + // hardware.
>> + //
>> + if ( (int)be32toh(vm->ph->p_memsz) < 0
>> + && va < be32toh(vm->ph->p_memsz)/2
>> + )
>> + return be32toh(vm->ph->p_memsz)/2;
>> +
>> + return be32toh(vm->ph->p_memsz) - va;
>> + }
>> +
>> _kvm_err(kd, kd->program, "Raw corefile not supported");
>> return (0);
>> }
>> Index: /usr/src/lib/libkvm/kvm_private.c
>> ===================================================================
>> --- /usr/src/lib/libkvm/kvm_private.c (revision 347549)
>> +++ /usr/src/lib/libkvm/kvm_private.c (working copy)
>> @@ -131,7 +131,9 @@
>> {
>>
>> return (kd->nlehdr.e_ident[EI_CLASS] == class &&
>> - kd->nlehdr.e_type == ET_EXEC &&
>> + ( kd->nlehdr.e_type == ET_EXEC ||
>> + kd->nlehdr.e_type == ET_DYN
>> + ) &&
>> kd->nlehdr.e_machine == machine);
>> }
>>
>>
>>
>
> The following is was is in the .sbss/.bss up to the page
> boundry (after the .got bytes):
>
> (kgdb) x/s 0x2a66cdc
> 0x2a66cdc: "$FreeBSD: head/lib/csu/powerpc/crt1.c 326219 2017-11-26 02:00:33Z pfg $"
>
> (kgdb) x/s 0x2a66d24
> 0x2a66d24: "$FreeBSD: head/lib/csu/common/crtbrand.c 340701 2018-11-20 20:59:49Z emaste $"
>
> (kgdb) x/s 0x2a66d72
> 0x2a66d72: "$FreeBSD: head/lib/csu/common/ignore_init.c 340702 2018-11-20 21:04:20Z emaste $"
>
> (kgdb) x/s 0x2a66dc3
> 0x2a66dc3: "FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)"
>
> (kgdb) x/s 0x2a66e15
> 0x2a66e15: "$FreeBSD: head/lib/csu/powerpc/crti.S 217399 2011-01-14 11:34:58Z kib $"
>
> (kgdb) x/s 0x2a66e5d
> 0x2a66e5d: "$FreeBSD: head/sbin/mount/getmntopts.c 326025 2017-11-20 19:49:47Z pfg $"
>
> (kgdb) x/s 0x2a66ea6
> 0x2a66ea6: "$FreeBSD: head/lib/libutil/login_tty.c 334106 2018-05-23 17:02:12Z jhb $"
>
> (kgdb) x/s 0x2a66eef
> 0x2a66eef: "$FreeBSD: head/lib/libutil/login_class.c 296723 2016-03-12 14:54:34Z kib $"
>
> (kgdb) x/s 0x2a66f83
> 0x2a66f83: "$FreeBSD: head/lib/libutil/_secure_path.c 139012 2004-12-18 12:31:12Z ru $"
>
> (kgdb) x/s 0x2a66fce
> 0x2a66fce: "$FreeBSD: head/lib/libcrypt/crypt.c 326219 2017-11
>
> (I truncated that last to avoid the 0xfa5005af's on the next page
> in RAM.)
>
> Compare ( from readelf /sbin/init ):
>
> String dump of section '.comment':
> [ 0] $FreeBSD: head/lib/csu/powerpc/crt1.c 326219 2017-11-26 02:00:33Z pfg $
> [ 48] $FreeBSD: head/lib/csu/common/crtbrand.c 340701 2018-11-20 20:59:49Z emaste $
> [ 96] $FreeBSD: head/lib/csu/common/ignore_init.c 340702 2018-11-20 21:04:20Z emaste $
> [ e7] FreeBSD clang version 8.0.0 (tags/RELEASE_800/final 356365) (based on LLVM 8.0.0)
> [ 139] $FreeBSD: head/lib/csu/powerpc/crti.S 217399 2011-01-14 11:34:58Z kib $
> [ 181] $FreeBSD: head/sbin/mount/getmntopts.c 326025 2017-11-20 19:49:47Z pfg $
> [ 1ca] $FreeBSD: head/lib/libutil/login_tty.c 334106 2018-05-23 17:02:12Z jhb $
> [ 213] $FreeBSD: head/lib/libutil/login_class.c 296723 2016-03-12 14:54:34Z kib $
> [ 25e] $FreeBSD: head/lib/libutil/login_cap.c 317265 2017-04-21 19:27:33Z pfg $
> [ 2a7] $FreeBSD: head/lib/libutil/_secure_path.c 139012 2004-12-18 12:31:12Z ru $
> [ 2f2] $FreeBSD: head/lib/libcrypt/crypt.c 326219 2017-11-26 02:00:33Z pfg $
> . . .
>
> Note:
>
> Program Headers:
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> LOAD 0x000000 0x01800000 0x01800000 0x140ad4 0x140ad4 R E 0x10000
> LOAD 0x140ae0 0x01950ae0 0x01950ae0 0x061fc 0x35108 RWE 0x10000
> NOTE 0x0000d4 0x018000d4 0x018000d4 0x00048 0x00048 R 0x4
> TLS 0x140ae0 0x01950ae0 0x01950ae0 0x00b10 0x00b1d R 0x10
> GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10
>
> Section to Segment mapping:
> Segment Sections...
> 00 .note.tag .init .text .fini .rodata .eh_frame
> 01 .tdata .tbss .init_array .fini_array .ctors .dtors .jcr .data.rel.ro .data .got .sbss .bss
> 02 .note.tag
> 03 .tdata .tbss
> 04
> There are 24 section headers, starting at offset 0x16cec8:
>
> Section Headers:
> [Nr] Name Type Addr Off Size ES Flg Lk Inf Al
> . . .
> [16] .got PROGBITS 01956ccc 146ccc 000010 04 WAX 0 0 4
> [17] .sbss NOBITS 01956cdc 146cdc 0000b0 00 WA 0 0 4
> [18] .bss NOBITS 01956dc0 146cdc 02ee28 00 WA 0 0 64
> [19] .comment PROGBITS 00000000 146cdc 0073d4 01 MS 0 0 1
>
> It looks like material after the .got is being copied,
> spanning the in-file-empty .sbss and .bss sections and
> implicitly initializing (the first part of) those
> sections.
The ->valid assignments appears to trace to code like:
/*
* The last page has valid blocks. Invalid part can only
* exist at the end of file, and the page is made fully valid
* by zeroing in vm_pager_get_pages().
*/
if (m[count - 1]->valid != 0 && --count == 0) {
if (iodone != NULL)
iodone(arg, m, 1, 0);
return (VM_PAGER_OK);
}
independent of if the requested data does not span
into the last page but does not span to the end of
a page.
So it appears that the use of:
QUOTE
vm_imgact_map_page uses vm_imgact_hold_page.
vm_imgact_hold_page uses vm_pager_get_pages.
vm_pager_get_pages uses vm_page_zero_invalid
to "Zero out partially filled data"
END QUOTE
simply does not do the right thing for .sbss
or .bss handling. The m->valid related code
for zeroing is basically irrelevant to .sbss
and .bss.
Note that the below code requires a m->valid bit
to be asserted in order to do any
pmap_zero_page_area operations. Thus it does not
zero out pages that are completely invalid either.
This explains why I see 0xfa5005af on the full
pages in the .sbss/.bss area for debug builds:
nothing is zeroing the full pages either.
void
vm_page_zero_invalid(vm_page_t m, boolean_t setvalid)
{
int b;
int i;
VM_OBJECT_ASSERT_WLOCKED(m->object);
/*
* Scan the valid bits looking for invalid sections that
* must be zeroed. Invalid sub-DEV_BSIZE'd areas ( where the
* valid bit may be set ) have already been zeroed by
* vm_page_set_validclean().
*/
for (b = i = 0; i <= PAGE_SIZE / DEV_BSIZE; ++i) {
if (i == (PAGE_SIZE / DEV_BSIZE) ||
(m->valid & ((vm_page_bits_t)1 << i))) {
if (i > b) {
pmap_zero_page_area(m,
b << DEV_BSHIFT, (i - b) << DEV_BSHIFT);
}
b = i + 1;
}
}
/*
* setvalid is TRUE when we can safely set the zero'd areas
* as being valid. We can do this if there are no cache consistancy
* issues. e.g. it is ok to do with UFS, but not ok to do with NFS.
*/
if (setvalid)
m->valid = VM_PAGE_BITS_ALL;
}
This code simply does not do the right thing for .sbss and
.bss handling.
__start in /sbin/init (for example) expects .sbss and .bss
to have already been initialized to zero (and possibly
further adjusted after that for something like environ).
So far I find nothing to cover that.
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-toolchain
mailing list