thread suspension when dumping core
Konstantin Belousov
kostikbel at gmail.com
Thu Jun 9 04:35:05 UTC 2016
On Wed, Jun 08, 2016 at 11:17:44PM +0200, Jilles Tjoelker wrote:
> On Wed, Jun 08, 2016 at 04:56:35PM +0300, Konstantin Belousov wrote:
> > On Wed, Jun 08, 2016 at 06:35:08AM -0700, Mark Johnston wrote:
> > > On Wed, Jun 08, 2016 at 07:30:55AM +0300, Konstantin Belousov wrote:
> > > > On Tue, Jun 07, 2016 at 11:19:19PM +0200, Jilles Tjoelker wrote:
> > > > > I also wonder whether we may be overengineering things here. Perhaps
> > > > > the advlock sleep can simply turn off TDF_SBDRY.
> > > > Well, this was the very first patch suggested. I would be fine with that,
> > > > but again, out-of-tree code seems to be not quite fine with that local
> > > > solution.
>
> > > In our particular case, we could possibly use a similar approach. In
> > > general, it seems incorrect to clear TDF_SBDRY if the thread calling
> > > sx_sleep() has any locks held. It is easy to verify that all callers of
> > > lf_advlock() are safe in this respect, but this kind of auditing is
> > > generally hard. In fact, I believe the sx_sleep that led to the problem
> > > described in D2612 is the same as the one in my case. That is, the
> > > sleeping thread may or may not hold a vnode lock depending on context.
>
> > I do not think that in-tree code sleeps with a vnode lock held in
> > the lf_advlock(). Otherwise, system would hang in lock cascade by
> > an attempt to obtain an advisory lock. I think we can even assert
> > this with witness.
>
> > There is another sleep, which Jilles mentioned, in lf_purgelocks(),
> > called from vgone(). This sleep indeed occurs under the vnode lock, and
> > as such must be non-suspendable. The sleep waits until other threads
> > leave the lf_advlock() for the reclaimed vnode, and they should leave in
> > deterministic time due to issued wakeups. So this sleep is exempt from
> > the considerations, and TDF_SBDRY there is correct.
>
> > I am fine with either the braces around sx_sleep() in lf_advlock() to
> > clear TDF_SBDRY (sigdeferstsop()), or with the latest patch I sent,
> > which adds temporal override for TDF_SBDRY with TDF_SRESTART. My
> > understanding is that you prefer the later. If I do not mis-represent
> > your position, I understand why you do prefer that.
>
> The TDF_SRESTART change does fix some more problems such as umount -f
> getting stuck in lf_purgelocks().
>
> However, it introduces some subtle issues that may not necessarily be a
> sufficient objection.
I did not see an objection in the notes below, but rather I read them
as an argument to return EINTR from the stop attempts from lf_advlock().
>
> Firstly, adding this closes the door on fixing signal handling for
> fcntl(F_SETLKW). Per POSIX, any caught signal interrupts
> fcntl(F_SETLKW), even if SA_RESTART is set for the signal, and the Linux
> man page documents the same. Our man page has documented that SA_RESTART
> behaves normally with fcntl(F_SETLKW) since at least FreeBSD 2.0. This
> could normally be fixed via if (error == ERESTART) error = EINTR; but
> that is no longer possible if there are [ERESTART] errors that should
> still restart.
I added the translation to fcntl handler.
>
> Secondly, fcntl(F_SETLKW) restarting after a stop may actually be
> observable, contrary to what I wrote before. This is due to the fair
> queuing. Suppose thread A has locked byte 1 a while ago and thread B is
> trying to lock byte 1 and 2 right now. Then thread C will be able to
> lock byte 2 iff thread B has not blocked yet. If thread C will not be
> allowed to lock byte 2 and will block on it, the TDF_SRESTART change
> will cause it to be awakened if thread B is stopped. When thread B
> resumes, the region to be locked will be recomputed. This scenario
> unambiguously violates the POSIX requirement but I don't know how bad it
> is.
Indeed.
>
> Note that all these threads must be in separate processes because of
> fcntl locks' strange semantics.
So below is the next level of over-engineering the stuff. I made it
unified on sigdeferstop() to avoid blowing up the KPI. I am not sure what
modes are needed by onefs, so I added SIGDEFERSTOP_ERESTART despite it
is not used by the kernel.
lf_advlock() is not stoppable at all, with EINTR return. In the previous
patches, only if the caller of lf_advlock() already set TDF_SBDRY, the
function modified the behaviour.
I considered adding td_sbdry member to struct thread for managing sbdry
stops, but did not for now. If one more flag appear to be used, I will
change that.
diff --git a/sys/fs/fifofs/fifo_vnops.c b/sys/fs/fifofs/fifo_vnops.c
index 716faa3..fb3eb90 100644
--- a/sys/fs/fifofs/fifo_vnops.c
+++ b/sys/fs/fifofs/fifo_vnops.c
@@ -194,11 +194,10 @@ fifo_open(ap)
if ((ap->a_mode & FREAD) && fip->fi_writers == 0) {
gen = fip->fi_wgen;
VOP_UNLOCK(vp, 0);
- stops_deferred = sigallowstop();
+ stops_deferred = sigdeferstop(SIGDEFERSTOP_OFF);
error = msleep(&fip->fi_readers, PIPE_MTX(fpipe),
PDROP | PCATCH | PSOCK, "fifoor", 0);
- if (stops_deferred)
- sigdeferstop();
+ sigallowstop(stops_deferred);
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
if (error != 0 && gen == fip->fi_wgen) {
fip->fi_readers--;
@@ -222,11 +221,10 @@ fifo_open(ap)
if ((ap->a_mode & FWRITE) && fip->fi_readers == 0) {
gen = fip->fi_rgen;
VOP_UNLOCK(vp, 0);
- stops_deferred = sigallowstop();
+ stops_deferred = sigdeferstop(SIGDEFERSTOP_OFF);
error = msleep(&fip->fi_writers, PIPE_MTX(fpipe),
PDROP | PCATCH | PSOCK, "fifoow", 0);
- if (stops_deferred)
- sigdeferstop();
+ sigallowstop(stops_deferred);
vn_lock(vp, LK_EXCLUSIVE | LK_RETRY);
if (error != 0 && gen == fip->fi_rgen) {
fip->fi_writers--;
diff --git a/sys/kern/kern_descrip.c b/sys/kern/kern_descrip.c
index 7f0ef8d..58a37f9 100644
--- a/sys/kern/kern_descrip.c
+++ b/sys/kern/kern_descrip.c
@@ -648,6 +648,16 @@ kern_fcntl(struct thread *td, int fd, int cmd, intptr_t arg)
PROC_UNLOCK(p->p_leader);
error = VOP_ADVLOCK(vp, (caddr_t)p->p_leader, F_SETLK,
flp, flg);
+ /*
+ * Do not automatically restart the lock
+ * acquire, for two reasons. First, POSIX
+ * requires signal delivery to return EINTR.
+ * Second, fairness of the lock acquision,
+ * ensured by queueing in lf_advlock(), would
+ * be defeated by the retry.
+ */
+ if (cmd == F_SETLKW && error == ERESTART)
+ error = EINTR;
break;
case F_WRLCK:
if ((fp->f_flag & FWRITE) == 0) {
@@ -659,6 +669,8 @@ kern_fcntl(struct thread *td, int fd, int cmd, intptr_t arg)
PROC_UNLOCK(p->p_leader);
error = VOP_ADVLOCK(vp, (caddr_t)p->p_leader, F_SETLK,
flp, flg);
+ if (cmd == F_SETLKW && error == ERESTART)
+ error = EINTR;
break;
case F_UNLCK:
error = VOP_ADVLOCK(vp, (caddr_t)p->p_leader, F_UNLCK,
diff --git a/sys/kern/kern_lockf.c b/sys/kern/kern_lockf.c
index a0a3789..970a38d 100644
--- a/sys/kern/kern_lockf.c
+++ b/sys/kern/kern_lockf.c
@@ -1378,7 +1378,7 @@ lf_setlock(struct lockf *state, struct lockf_entry *lock, struct vnode *vp,
void **cookiep)
{
static char lockstr[] = "lockf";
- int priority, error;
+ int error, priority, stops_deferred;
#ifdef LOCKF_DEBUG
if (lockf_debug & 1)
@@ -1466,7 +1466,9 @@ lf_setlock(struct lockf *state, struct lockf_entry *lock, struct vnode *vp,
}
lock->lf_refs++;
+ stops_deferred = sigdeferstop(SIGDEFERSTOP_EINTR);
error = sx_sleep(lock, &state->ls_lock, priority, lockstr, 0);
+ sigallowstop(stops_deferred);
if (lf_free_lock(lock)) {
error = EDOOFUS;
goto out;
diff --git a/sys/kern/kern_sig.c b/sys/kern/kern_sig.c
index 75a1259..ee5d253 100644
--- a/sys/kern/kern_sig.c
+++ b/sys/kern/kern_sig.c
@@ -2596,41 +2596,81 @@ tdsigcleanup(struct thread *td)
}
+static int
+sigdeferstop_curr_flags(int cflags)
+{
+
+ MPASS((cflags & (TDF_SEINTR | TDF_SERESTART)) == 0 ||
+ (cflags & TDF_SBDRY) != 0);
+ return (cflags & (TDF_SBDRY | TDF_SEINTR | TDF_SERESTART));
+}
+
/*
- * Defer the delivery of SIGSTOP for the current thread. Returns true
- * if stops were deferred and false if they were already deferred.
+ * Defer the delivery of SIGSTOP for the current thread, according to
+ * the requested mode. Returns previous flags, which must be restored
+ * by sigallowstop().
+ *
+ * TDF_SBDRY, TDF_SEINTR, and TDF_SERESTART flags are only set and
+ * cleared by the current thread, which allow the lock-less read-only
+ * accesses below.
*/
int
-sigdeferstop(void)
+sigdeferstop(int mode)
{
struct thread *td;
+ int cflags, nflags;
td = curthread;
- if (td->td_flags & TDF_SBDRY)
- return (0);
- thread_lock(td);
- td->td_flags |= TDF_SBDRY;
- thread_unlock(td);
- return (1);
+ cflags = sigdeferstop_curr_flags(td->td_flags);
+ switch (mode) {
+ case SIGDEFERSTOP_NOP:
+ nflags = cflags;
+ break;
+ case SIGDEFERSTOP_OFF:
+ nflags = 0;
+ break;
+ case SIGDEFERSTOP_SILENT:
+ nflags = (cflags | TDF_SBDRY) & ~(TDF_SEINTR | TDF_SERESTART);
+ break;
+ case SIGDEFERSTOP_EINTR:
+ nflags = (cflags | TDF_SBDRY | TDF_SEINTR) & ~TDF_SERESTART;
+ break;
+ case SIGDEFERSTOP_ERESTART:
+ nflags = (cflags | TDF_SBDRY | TDF_SERESTART) & ~TDF_SEINTR;
+ break;
+ default:
+ panic("sigdeferstop: invalid mode %x", mode);
+ break;
+ }
+ if (cflags != nflags) {
+ thread_lock(td);
+ td->td_flags = (td->td_flags & ~cflags) | nflags;
+ thread_unlock(td);
+ }
+ return (cflags);
}
/*
- * Permit the delivery of SIGSTOP for the current thread. This does
- * not immediately suspend if a stop was posted. Instead, the thread
- * will suspend either via ast() or a subsequent interruptible sleep.
+ * Restores the STOP handling mode, typically permitting the delivery
+ * of SIGSTOP for the current thread. This does not immediately
+ * suspend if a stop was posted. Instead, the thread will suspend
+ * either via ast() or a subsequent interruptible sleep.
*/
-int
-sigallowstop(void)
+void
+sigallowstop(int prev)
{
struct thread *td;
- int prev;
+ int cflags;
+ KASSERT((prev & ~(TDF_SBDRY | TDF_SEINTR | TDF_SERESTART)) == 0,
+ ("sigallowstop: incorrect previous mode %x", prev));
td = curthread;
- thread_lock(td);
- prev = (td->td_flags & TDF_SBDRY) != 0;
- td->td_flags &= ~TDF_SBDRY;
- thread_unlock(td);
- return (prev);
+ cflags = sigdeferstop_curr_flags(td->td_flags);
+ if (cflags != prev) {
+ thread_lock(td);
+ td->td_flags = (td->td_flags & ~cflags) | prev;
+ thread_unlock(td);
+ }
}
/*
diff --git a/sys/kern/kern_thread.c b/sys/kern/kern_thread.c
index 9af377e..c85b153 100644
--- a/sys/kern/kern_thread.c
+++ b/sys/kern/kern_thread.c
@@ -899,7 +899,7 @@ thread_suspend_check(int return_instead)
{
struct thread *td;
struct proc *p;
- int wakeup_swapper;
+ int wakeup_swapper, r;
td = curthread;
p = td->td_proc;
@@ -932,7 +932,21 @@ thread_suspend_check(int return_instead)
if ((td->td_flags & TDF_SBDRY) != 0) {
KASSERT(return_instead,
("TDF_SBDRY set for unsafe thread_suspend_check"));
- return (0);
+ switch (td->td_flags & (TDF_SEINTR | TDF_SERESTART)) {
+ case 0:
+ r = 0;
+ break;
+ case TDF_SEINTR:
+ r = EINTR;
+ break;
+ case TDF_SERESTART:
+ r = ERESTART;
+ break;
+ default:
+ panic("both TDF_SEINTR and TDF_SERESTART");
+ break;
+ }
+ return (r);
}
/*
diff --git a/sys/kern/subr_trap.c b/sys/kern/subr_trap.c
index 6d1ac70..eb44087 100644
--- a/sys/kern/subr_trap.c
+++ b/sys/kern/subr_trap.c
@@ -160,7 +160,7 @@ userret(struct thread *td, struct trapframe *frame)
("userret: Returning with with pinned thread"));
KASSERT(td->td_vp_reserv == 0,
("userret: Returning while holding vnode reservation"));
- KASSERT((td->td_flags & TDF_SBDRY) == 0,
+ KASSERT((td->td_flags & (TDF_SBDRY | TDF_SEINTR | TDF_SERESTART)) == 0,
("userret: Returning with stop signals deferred"));
KASSERT(td->td_su == NULL,
("userret: Returning with SU cleanup request not handled"));
diff --git a/sys/sys/mount.h b/sys/sys/mount.h
index f11f8f5..5438140 100644
--- a/sys/sys/mount.h
+++ b/sys/sys/mount.h
@@ -653,15 +653,15 @@ vfs_statfs_t __vfs_statfs;
#define VFS_PROLOGUE(MP) do { \
struct mount *mp__; \
- int _enable_stops; \
+ int _prev_stops; \
\
mp__ = (MP); \
- _enable_stops = (mp__ != NULL && \
- (mp__->mnt_vfc->vfc_flags & VFCF_SBDRY) && sigdeferstop())
+ _prev_stops = sigdeferstop((mp__ != NULL && \
+ (mp__->mnt_vfc->vfc_flags & VFCF_SBDRY) != 0) ? \
+ SIGDEFERSTOP_SILENT : SIGDEFERSTOP_NOP);
#define VFS_EPILOGUE(MP) \
- if (_enable_stops) \
- sigallowstop(); \
+ sigallowstop(_prev_stops); \
} while (0)
#define VFS_MOUNT(MP) ({ \
diff --git a/sys/sys/proc.h b/sys/sys/proc.h
index 629f7e8..1619d34 100644
--- a/sys/sys/proc.h
+++ b/sys/sys/proc.h
@@ -395,9 +395,9 @@ do { \
#define TDF_NEEDRESCHED 0x00010000 /* Thread needs to yield. */
#define TDF_NEEDSIGCHK 0x00020000 /* Thread may need signal delivery. */
#define TDF_NOLOAD 0x00040000 /* Ignore during load avg calculations. */
-#define TDF_UNUSED19 0x00080000 /* --available-- */
+#define TDF_SERESTART 0x00080000 /* ERESTART on stop attempts. */
#define TDF_THRWAKEUP 0x00100000 /* Libthr thread must not suspend itself. */
-#define TDF_UNUSED21 0x00200000 /* --available-- */
+#define TDF_SEINTR 0x00200000 /* EINTR on stop attempts. */
#define TDF_SWAPINREQ 0x00400000 /* Swapin request due to wakeup. */
#define TDF_UNUSED23 0x00800000 /* --available-- */
#define TDF_SCHED0 0x01000000 /* Reserved for scheduler private use */
diff --git a/sys/sys/signalvar.h b/sys/sys/signalvar.h
index e574ec3..176bb2a 100644
--- a/sys/sys/signalvar.h
+++ b/sys/sys/signalvar.h
@@ -325,9 +325,21 @@ extern struct mtx sigio_lock;
#define SIGPROCMASK_PROC_LOCKED 0x0002
#define SIGPROCMASK_PS_LOCKED 0x0004
+/*
+ * Modes for sigdeferstop(). Manages behaviour of
+ * thread_suspend_check() in the region delimited by
+ * sigdeferstop()/sigallowstop(). Must be restored to
+ * SIGDEFERSTOP_OFF before returning to userspace.
+ */
+#define SIGDEFERSTOP_NOP 0 /* continue doing whatever is done now */
+#define SIGDEFERSTOP_OFF 1 /* stop ignoring STOPs */
+#define SIGDEFERSTOP_SILENT 2 /* silently ignore STOPs */
+#define SIGDEFERSTOP_EINTR 3 /* ignore STOPs, return EINTR */
+#define SIGDEFERSTOP_ERESTART 4 /* ignore STOPs, return ERESTART */
+
int cursig(struct thread *td);
-int sigdeferstop(void);
-int sigallowstop(void);
+int sigdeferstop(int mode);
+void sigallowstop(int prev);
void execsigs(struct proc *p);
void gsignal(int pgid, int sig, ksiginfo_t *ksi);
void killproc(struct proc *p, char *why);
More information about the freebsd-current
mailing list