head -356426 based 32-bit powerpc (2 sock PowerMac G4): kyua sys/vm/mlock_test:mlock__copy_on_write_vnode seems stuck
Mark Millard
marklmi at yahoo.com
Thu Jan 9 03:51:30 UTC 2020
[Even the artifacts debug-style kernel gets the problem.
More evidence of context extracted and supplied.]
On 2020-Jan-8, at 18:09, Mark Millard <marklmi at yahoo.com> wrote:
> [I got kernel backtrace information, included.]
>
> On 2020-Jan-8, at 15:12, Mark Millard <marklmi at yahoo.com> wrote:
>
>> I've attempted a head -356426 based kyua run on an
>> old 2-socket PowerMac G4. The buildworld and
>> buildkernel were non-debug. The context has personal
>> patches, mostly for dealing with PowerMac issues.
>>
>> It has had over 180 CPU minutes running for:
>>
>> sys/vm/mlock_test:mlock__copy_on_write_vnode ->
>>
>> Normal seems to be way under 0.1 sec on the
>> other platforms I've made runs on recently.
>>
>> Hopefully kyua will time out and continue the
>> testing at some point.
>>
>> The 2 socket (2 cores each) G5 powerpc64 context
>> did not have this problem. Nor did the armv7 or
>> aarch64 examples (CortexA7, CortexA53, CortexA57,
>> and CortexA72).
>
>
> I finally gave up on it after 240 CPU minutes but
> could not kill/stop the stuck process. So I
> sync'd the file system and tried a "shutdown -r
> now" and forced the power off after it got stuck
> (no reboot happened).
>
> After power-up I tried:
>
> # kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test
> sys/vm/mlock_test:mlock__copy_on_write_anon -> passed [0.017s]
> sys/vm/mlock_test:mlock__copy_on_write_vnode ->
>
> and it got stuck again.
>
> I'll note that ps -auxd shows the likes of:
>
> root 1120 0.0 0.4 11512 8772 0 I+ 16:38 0:00.62 | | `-- kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test
> root 1124 100.0 0.1 4640 2332 - Rs 16:38 2:57.43 | | `-- /usr/tests/sys/vm/mlock_test -vunprivileged-user=tests -r/tmp/kyua.B2pXx8/3/result.atf mlock__copy_on_write_vnode
> root 1125 0.0 0.0 4640 620 - TXL 16:38 0:00.00 | | `-- /usr/tests/sys/vm/mlock_test -vunprivileged-user=tests -r/tmp/kyua.B2pXx8/3/result.atf mlock__copy_on_write_vnode
>
>
> I got a couple of backtraces from the kernel
> via the ddb> prompt :
>
> pid 1125 was in thread_suspend_switch called via ptracestop.
>
> I've a couple of examples of pid 1124 (the CPU time
> taker): (manually typed from screen images)
>
> 0xdc9e0520: at mi_switch+0x17c
> 0xdc9e0540: at critical_exit_preempt+0x7c
> 0xdc9e0560: at powerpc_interrupt+0x1c4
> 0xdc9e0590: at kernel EXI trap by __syncicache+0x5c:
> srr1= 0x209032
> r1= 0xdc9e0650
> cr= 0x8822fc22
> xer= 0
> ctr= 0
> frame=0xdc9e0598
> 0xdc9e0650: at 0x5ed67ec
> 0xdc9e0660: at moea_sync_icache+0x118
> Note: From here on down is common with the
> other example backtrace:
> 0xdc9e0690: at pmap_sync_icache+0x98
> 0xdc9e06b0: at vm_sync_icache+0x2c
> 0xdc9e06c0: at proc_rwmem+0x13c
> 0xdc9e0710: at kern_ptrace+0x76c
> 0xdc9e0830: at sys_ptrace+0x12c
> 0xdc9e0960: at trap+0xae8
> 0xdc9e0a10: at powerpc_interrupt+0x1ec
> 0xdc9e0a40: at use SC trap by 0x4191ea48:
> srr1= 0x209032
> r1= 0xffffc6d0
> cr= 0x28000200
> xer= 0
> ctr= 0x4191ea40
> frame=0xdc9e0a48
>
> The non-common part of the other backtrace
> is:
>
> 0xdc9e04a0: at intr_event_handle+0xd4
> 0xdc9e04e0: at powerpc_dispatch_intr+0xe0
> 0xdc9e0520: at openpic_dispatch+0x90
> 0xdc9e0540: at powerpc_interrupt+0x128
> 0xdc9e0570: at kernel EXI trap by moea_pvo_find_va:
> srr1= 0xf032
> r1= 0xdc9e0630
> cr= 0x4822fc22
> xer= 0
> ctr= 0
> frame=0xdc9e0578
> 0xdc9e0630: at 0x41b76ffc
> 0xdc9e0660: at moea_sync_icache+0x100
>
> Showing a lock chain showed just one line:
>
> thread 100151 (pid 1124, mlock_test) running on CPU 0
>
> The pcpu output for cpuid 0 metioned: critnest 2 "mlock_test"
> when I tried it.
>
> (After that I did something that locked up the machine,
> probably my fault.)
>
>
> It does not look like I can complete a normal kyua
> run for a 2-socket 32-bit powerpc. May be someone
> else can for some multi-socket 32-bit powerpc to
> see if this repeats.
>
> Single-socket/single-core might prove interesting
> as well. Maybe I can try such.
It fails on the single-processor/single-core G4 PowerMac
that I tested as well.
I then installed the kernel*.txz from:
https://artifact.ci.freebsd.org/snapshot/head/r356426/powerpc/powerpc/
and rebooted with it via boot -v (to avoid the
memory modified-after-free for debug kernels).
(I'd forgotten about debug stopping and reporting
such.)
It still fails:
# kyua test -k /usr/tests/Kyuafile sys/vm/mlock_test
sys/vm/mlock_test:mlock__copy_on_write_anon -> passed [0.029s]
sys/vm/mlock_test:mlock__copy_on_write_vnode ->
The debug kernel does not report anything for
it.
I gather that the test's internal description is:
/*
* Use ptrace(2) to trigger a copy-on-write fault of a read-only text page.
*/
ATF_TC_WITHOUT_HEAD(mlock__copy_on_write_vnode);
ATF_TC_BODY(mlock__copy_on_write_vnode, tc)
{
void *addr;
int len;
len = getpagesize();
addr = (void *)((uintptr_t)test_wired_copy_on_write & ~(len - 1));
test_wired_copy_on_write(addr, len);
}
where test_wired_copy_on_write is:
static void
test_wired_copy_on_write(void *addr, size_t len)
{
int status, val;
pid_t pid;
pid = fork();
if (pid == -1)
atf_tc_fail("fork() failed: %s", strerror(errno));
if (pid == 0) {
if (mlock(addr, len) != 0)
_exit(1);
if (ptrace(PT_TRACE_ME, 0, NULL, 0) != 0)
_exit(2);
if (raise(SIGSTOP) != 0)
_exit(3);
if (munlock(addr, len) != 0)
_exit(4);
_exit(0);
}
ATF_REQUIRE(waitpid(pid, &status, 0) == pid);
ATF_REQUIRE_MSG(!WIFEXITED(status),
"child exited with status %d", WEXITSTATUS(status));
ATF_REQUIRE(WIFSTOPPED(status));
ATF_REQUIRE(WSTOPSIG(status) == SIGSTOP);
errno = 0;
val = ptrace(PT_READ_D, pid, addr, 0);
ATF_REQUIRE(errno == 0);
ATF_REQUIRE(ptrace(PT_WRITE_D, pid, addr, val) == 0);
ATF_REQUIRE(ptrace(PT_CONTINUE, pid, (caddr_t)1, 0) == 0);
ATF_REQUIRE(waitpid(pid, &status, 0) == pid);
ATF_REQUIRE(WIFEXITED(status));
ATF_REQUIRE_MSG(WEXITSTATUS(status) == 0,
"child exited with status %d", WSTOPSIG(status));
}
I give my evidence below, but the ptrace(PT_WRITE_D, . . .)
looks to be the one involved in the backtraces:
It looks like proc_rwmem is using the vm_sync_icache in:
/*
* If we are writing, then we request vm_fault() to create a private
* copy of each page. Since these copies will not be writeable by the
* process, we must explicity request that they be dirtied.
*/
writing = uio->uio_rw == UIO_WRITE;
reqprot = writing ? VM_PROT_COPY | VM_PROT_READ : VM_PROT_READ;
fault_flags = writing ? VM_FAULT_DIRTY : VM_FAULT_NORMAL;
. . .
/* Make the I-cache coherent for breakpoints. */
if (writing && error == 0) {
vm_map_lock_read(map);
if (vm_map_check_protection(map, pageno, pageno +
PAGE_SIZE, VM_PROT_EXECUTE))
vm_sync_icache(map, uva, len);
vm_map_unlock_read(map);
}
This appears to mean that it got there via kern_ptrace's:
case PT_WRITE_I:
case PT_WRITE_D:
td2->td_dbgflags |= TDB_USERWR;
PROC_UNLOCK(p);
error = 0;
if (proc_writemem(td, p, (off_t)(uintptr_t)addr, &data,
sizeof(int)) != sizeof(int))
error = ENOMEM;
else
CTR3(KTR_PTRACE, "PT_WRITE: pid %d: %p <= %#x",
p->p_pid, addr, data);
PROC_LOCK(p);
break;
(via proc_writemem).
So the test's ptrace involved in the backtraces seems to be:
ATF_REQUIRE(ptrace(PT_WRITE_D, pid, addr, val) == 0);
(Not that I've a clue what to do with that information
at this point.)
===
Mark Millard
marklmi at yahoo.com
( dsl-only.net went
away in early 2018-Mar)
More information about the freebsd-ppc
mailing list