svn commit: r322272 - head/sys/compat/linuxkpi/common/src

Ryan Stone rysto32 at gmail.com
Wed Jan 6 20:45:05 UTC 2021


On Tue, Aug 8, 2017 at 3:36 PM Alexander Motin <mav at freebsd.org> wrote:
>
> Author: mav
> Date: Tue Aug  8 19:36:34 2017
> New Revision: 322272
> URL: https://svnweb.freebsd.org/changeset/base/322272
>
> Log:
>   Fix few issues of LinuxKPI workqueue.
>
>   LinuxKPI workqueue wrappers reported "successful" cancellation for works
>   already completed in normal way.  This change brings reported status and
>   real cancellation fact into sync.  This required for drm-next operation.
>
>   Reviewed by:  hselasky (earlier version)
>   Sponsored by: iXsystems, Inc.
>   Differential Revision:        https://reviews.freebsd.org/D11904
>

> @@ -266,13 +267,14 @@ linux_delayed_work_timer_fn(void *arg)
>                 [WORK_ST_IDLE] = WORK_ST_IDLE,          /* NOP */
>                 [WORK_ST_TIMER] = WORK_ST_TASK,         /* start queueing task */
>                 [WORK_ST_TASK] = WORK_ST_TASK,          /* NOP */
> -               [WORK_ST_EXEC] = WORK_ST_TASK,          /* queue task another time */
> -               [WORK_ST_CANCEL] = WORK_ST_IDLE,        /* complete cancel */
> +               [WORK_ST_EXEC] = WORK_ST_EXEC,          /* NOP */
> +               [WORK_ST_CANCEL] = WORK_ST_TASK,        /* failed to cancel */
>         };
>         struct delayed_work *dwork = arg;
>
>         switch (linux_update_state(&dwork->work.state, states)) {
>         case WORK_ST_TIMER:
> +       case WORK_ST_CANCEL:
>                 linux_delayed_work_enqueue(dwork);
>                 break;
>         default:

I believe that this hunk introduced a regression into workqueue.
Consider the following scenario:

1. A delayed_work struct in the WORK_ST_TIMER state.
2. Thread A calls mod_delayed_work()
3. Thread B (a callout thread) simultaneously calls
linux_delayed_work_timer_fn()

The following sequence of events is possible:

A: Call linux_cancel_delayed_work()
A: Change state from TIMER TO CANCEL
B: Change state from CANCEL to TASK
B: taskqueue_enqueue() the task
A: taskqueue_cancel() the task
A: Call linux_queue_delayed_work_on().  This is a no-op because the
state is WORK_ST_TASK.

As a result, the delayed_work struct will never be invoked.  This is
causing address resolution in ib_addr.c to stop permanently, as it
never tries to reschedule a task that it thinks is already scheduled.

Do you have a recommendation?  Should we unconditionally
taskqueue_enqueue() when in the WORK_ST_TASK state and
linux_queue_delayed_work_on() is called?  That is harmless for a
pending task but will break the deadlock if the race is lost.


More information about the svn-src-head mailing list