Re: Hang ast / pipelk / piperd

From: Paul Floyd <paulf2718_at_gmail.com>
Date: Sat, 28 May 2022 07:02:23 UTC
On 5/27/22 22:13, Paul Floyd wrote:
>
> Hi
>
> I'm debugging two issues with Valgrind on FreeBSD 13.1 and 14, one on 
> amd64 and one on i386.
>
...
> |Both hangs seem quite sensitive to timing - in both cases adding or 
> changing nanosleep times seem to make them no longer hang. |
> |Adding debug statements to Valgrind can also change the behaviour 
> (and is also unsafe when not holding the scheduler lock). Does this 
> look like a kernel bug? |


|One important detail I missed out. Why is Valgrind releasing the 
scheduler lock?|

|
|

|To make a client syscall. This needs to be done in "client-like" 
circumstances - specifically, with the client signal mask (rather than 
the Valgrind mask, which is to mask all signals so that Valgrind has 
full control).|

|Two things can happen with a client syscall.|

|1/ it succeeds, and Valgrind will re-acquire the lock and continue.|

|2/ it gets interrupted, Valgrind re-acquires the lock, does a load of 
stuff to fixup the guest state and take the appropriate action (restart, 
return EINTR, save carry etc).|

|
|

|I did think that 2/ might be prone to get into an infinite loop, 
especially with restart. But I don't see anything like that in the 
Valgrind logs.|

PJF thread 14 making a client nanosleep syscall
|SYSCALL[5379,14](240) sys_nanosleep ( 0x200890, 0x0 ) --> [async] ...
|

|PJF -thread 14 releases the scheduler lock
--5379--   SCHED[14]: releasing lock (VG_(client_syscall)[async]) -> 
VgTs_WaitSys
|

|PJF thread 2 acquires the scheduler lock
--5379--   SCHED[2]:  acquired lock (VG_(client_syscall)[async]) ||
|

|PJF thread 2 return from nanosleep
SYSCALL[5379,2](240) ... [async] --> Success(0x0)

PJF thread 2 making a client write syscall
SYSCALL[5379,2](  4) sys_write ( 1, 0x4ea9000, 48 ) --> [async] ...

PJF thread 2 releases the scheduler lock
--5379--   SCHED[2]: releasing lock (VG_(client_syscall)[async]) -> 
VgTs_WaitSys

PJF this is the thread 2 printf from syscall write
tls_ptr: case "race" has mismatch: *ip=8 here=4

PJF thread 2 acquires the scheduler lock
--5379--   SCHED[2]:  acquired lock (VG_(client_syscall)[async])

PJF thread 2 return from write (30 bytes written)
SYSCALL[5379,2](  4) ... [async] --> Success(0x30)

PJF thread 2 making a client nanosleep syscall
SYSCALL[5379,2](240) sys_nanosleep ( 0x200890, 0x0 ) --> [async] ...

PJF thread 2 releases the scheduler lock
--5379--   SCHED[2]: releasing lock (VG_(client_syscall)[async]) -> 
VgTs_WaitSys
|

|And that's it, it hangs making the client nanosleep syscall.|

|
|

|A+|

|Paul
|

||