process in STOP state

David Xu davidxu at freebsd.org
Mon Jan 18 01:48:28 UTC 2010


Kostik Belousov wrote:
> On Sun, Jan 17, 2010 at 09:48:25AM +0800, David Xu wrote:
>> Tijl Coosemans wrote:
>>> On Friday 15 January 2010 02:31:22 David Xu wrote:
>>>  
>>>> Tijl Coosemans wrote:
>>>>    
>>>>>> Besides weird formatting of procstat -k output, I do not see
>>>>>> anything wrong in the state of the process. It got SIGSTOP, I am
>>>>>> sure. Attaching gdb helps because debugger gets signal reports
>>>>>> instead of target process getting the signal actions on signal
>>>>>> delivery.
>>>>>>
>>>>>> The only question is why the process gets SIGSTOP at all.
>>>>>>        
>>>>> Wine uses ptrace(2) sometimes. The SIGSTOP could have come from
>>>>> that. I recently submitted
>>>>> http://www.freebsd.org/cgi/query-pr.cgi?pr=142757 describing a
>>>>> problem with ptrace and signals, so you might want to give the
>>>>> kernel patch a try.
>>>>>      
>>>> The problem in your patch is that ksi pointer can not be hold across
>>>> thread sleeping, because once the process is resumed, there is no
>>>> guarantee that the thread will run first, once the signal came from
>>>> process's signal queue, other threads can remove the signal, and here
>>>> your sigqueue_take(ksi) is dangerous code.
>>>>    
>>> If other threads can run before the current thread then there's a
>>> second problem next to the one in the PR (current thread deletes
>>> signal that shouldn't be deleted). Then those other threads can see
>>> that the SIGSTOP bit (or another signal) is still set and stop the
>>> process a second time. This might be what happens in the OP's case.
>>>
>>> So, the signal has to be cleared before suspending the process, but
>>> then other threads can still deliver other signals which might change
>>> delivery order and I don't see any way around that besides introducing
>>> a per process signal lock that is also kept while the process is
>>> stopped. Comments?
>>>
>>>  
>> I don't have an idea now, we ever delivered signal to thread's queue,
>> though it may lose signal if thread exits, but it does not have the problem
>> you have described here, we may need to rethink how to fix the signal-lost
>> problem but still deliver signal to thread's queue, just an idea.
>>
> 
> I do think that signal shall be removed from the queue before
> ptracestop(), and I do not see a problem with other threads taking other
> signals. ptracestop() stops the whole process, and delivery order when
> signals go to different threads is somewhat vague due to scheduling
> events etc. So even if we guarantee that thread goes to userland for
> signal delivery before another thread, that another thread may still run
> handler observable earlier due to scheduler, swapping etc.
> 
> I think your fix might be slightly reworked, making three points:
> 1. do remove signal from queue, and reinsert it into the _head_ of the
> _thread_ queue after. We already got it from the head, and current
> thread did not masked the signal. Since thread cannot change the mask in
> between, this signal is the first to be delivered to this thread, and
> current thread is still eligible for delivery. I added KSI_HEAD flag to
> adjust behaviour of sigqueue_add().
> 
> 2. Instead of adding signal to sq_kill, I just call sigqueue_add
> with NULL ksi.
> 
> 3. To remove ksi from the queue, but not freeing it, I had to copy/paste
> the sigqueue_get() into sigqueue_steal(). It returns pointer to the
> removed ksi. I did not found good way to merge _get and steal.
> 
> diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
> index 1c21bc5..80e5a07 100644
> --- a/sys/kern/kern_sig.c
> +++ b/sys/kern/kern_sig.c
> @@ -316,6 +316,42 @@ sigqueue_get(sigqueue_t *sq, int signo, ksiginfo_t *si)
>  	return (signo);
>  }
>  
> +static int
> +sigqueue_steal(sigqueue_t *sq, int signo, ksiginfo_t **si)
> +{
> +	struct proc *p;
> +	struct ksiginfo *ksi, *next;
> +	int count;
> +
> +	KASSERT(sq->sq_flags & SQ_INIT, ("sigqueue not inited"));
> +	p = sq->sq_proc;
> +	count = 0;
> +
> +	if (!SIGISMEMBER(sq->sq_signals, signo))
> +		return (0);
> +
> +	if (SIGISMEMBER(sq->sq_kill, signo)) {
> +		count++;
> +		SIGDELSET(sq->sq_kill, signo);
> +	}
> +
> +	TAILQ_FOREACH_SAFE(ksi, &sq->sq_list, ksi_link, next) {
> +		if (ksi->ksi_signo == signo) {
> +			if (count == 0) {
> +				TAILQ_REMOVE(&sq->sq_list, ksi, ksi_link);
> +				ksi->ksi_sigq = NULL;
> +				*si = ksi;
> +			}
> +			if (++count > 1)
> +				break;
> +		}
> +	}
> +
> +	if (count <= 1)
> +		SIGDELSET(sq->sq_signals, signo);
> +	return (signo);
> +}
> +
>  void
>  sigqueue_take(ksiginfo_t *ksi)
>  {
> @@ -357,7 +393,10 @@ sigqueue_add(sigqueue_t *sq, int signo, ksiginfo_t *si)
>  
>  	/* directly insert the ksi, don't copy it */
>  	if (si->ksi_flags & KSI_INS) {
> -		TAILQ_INSERT_TAIL(&sq->sq_list, si, ksi_link);
> +		if (si->ksi_flags & KSI_HEAD)
> +			TAILQ_INSERT_HEAD(&sq->sq_list, si, ksi_link);
> +		else
> +			TAILQ_INSERT_TAIL(&sq->sq_list, si, ksi_link);
>  		si->ksi_sigq = sq;
>  		goto out_set_bit;
>  	}
> @@ -2492,6 +2531,7 @@ issignal(struct thread *td, int stop_allowed)
>  	struct sigacts *ps;
>  	struct sigqueue *queue;
>  	sigset_t sigpending;
> +	ksiginfo_t *psi;
>  	int sig, prop, newsig;
>  
>  	p = td->td_proc;
> @@ -2529,24 +2569,24 @@ issignal(struct thread *td, int stop_allowed)
>  		if (p->p_flag & P_TRACED && (p->p_flag & P_PPWAIT) == 0) {
>  			/*
>  			 * If traced, always stop.
> +			 * Remove old signal from queue before the stop.
> +			 * XXX shrug off debugger, it causes siginfo to
> +			 * be thrown away.
>  			 */
> +			queue = &td->td_sigqueue;
> +			psi = NULL;
> +			if (sigqueue_steal(queue, sig, &psi) == 0) {
> +				queue = &p->p_sigqueue;
> +				sigqueue_steal(queue, sig, &psi);
> +			}
> +			
>  			mtx_unlock(&ps->ps_mtx);
>  			newsig = ptracestop(td, sig);
>  			mtx_lock(&ps->ps_mtx);
>  
>  			if (sig != newsig) {
> -				ksiginfo_t ksi;
> -
> -				queue = &td->td_sigqueue;
> -				/*
> -				 * clear old signal.
> -				 * XXX shrug off debugger, it causes siginfo to
> -				 * be thrown away.
> -				 */
> -				if (sigqueue_get(queue, sig, &ksi) == 0) {
> -					queue = &p->p_sigqueue;
> -					sigqueue_get(queue, sig, &ksi);
> -				}
> +				if (psi != NULL && ksiginfo_tryfree(psi))
> +					p->p_pendingcnt--;
>  
>  				/*
>  				 * If parent wants us to take the signal,
> @@ -2561,10 +2601,14 @@ issignal(struct thread *td, int stop_allowed)
>  				 * Put the new signal into td_sigqueue. If the
>  				 * signal is being masked, look for other signals.
>  				 */
> -				SIGADDSET(queue->sq_signals, sig);
> +				sigqueue_add(queue, sig, NULL);
>  				if (SIGISMEMBER(td->td_sigmask, sig))
>  					continue;
>  				signotify(td);
> +			} else {
> +				if (psi != NULL)
> +					psi->ksi_flags |= KSI_INS | KSI_HEAD;
> +				sigqueue_add(&td->td_sigqueue, sig, psi);
>  			}
>  
>  			/*
> diff --git a/sys/sys/signalvar.h b/sys/sys/signalvar.h
> index c9455c2..c35fe69 100644
> --- a/sys/sys/signalvar.h
> +++ b/sys/sys/signalvar.h
> @@ -234,6 +234,7 @@ typedef struct ksiginfo {
>  #define	KSI_EXT		0x02	/* Externally managed ksi. */
>  #define KSI_INS		0x04	/* Directly insert ksi, not the copy */
>  #define	KSI_SIGQ	0x08	/* Generated by sigqueue, might ret EGAIN. */
> +#define	KSI_HEAD	0x10	/* Insert into head, not tail. */
>  #define	KSI_COPYMASK	(KSI_TRAP|KSI_SIGQ)
>  
>  #define	KSI_ONQ(ksi)	((ksi)->ksi_sigq != NULL)

In the patch, even if you removed the KSI from queue, current thread
still holds the pointer and sleeps, this might be not a good idea
since other code try to implement reliable signal like SIGCHLD and AIO 
signal and mqueue, those code may remove the signal from queue and free
it, if they run before the current thread, they will do.
otherwise, we have to use unreliable signals, that means, when a
realtime signal is delivered, sig_info is lost and only a bit set in
sq_kill like sigqueue() syscall does, this is not perfect.




More information about the freebsd-stable mailing list