5.3-RELEASE: WARNING - WRITE_DMA interrupt timout - what
does it mean?
Robert Watson
rwatson at freebsd.org
Wed Nov 10 02:26:26 PST 2004
On Wed, 10 Nov 2004, Zoltan Frombach wrote:
> I'm on FreeBSD 5.3-RELEASE. I get this when I try to patch the file:
<snip>
> Is your patch were produce for HEAD? Or what is wrong?
Indeed, where the contents of struct task have changed slightly to allow
for safe draining while shutting down a task. Try this variation; the
timeout is also modified to intentionally be 5/100 of a second per Andrea
Campi's keen bug searchign eyes :-).
Index: kern/subr_taskqueue.c
===================================================================
RCS file: /home/ncvs/src/sys/kern/subr_taskqueue.c,v
retrieving revision 1.24
diff -u -r1.24 subr_taskqueue.c
--- kern/subr_taskqueue.c 8 Aug 2004 02:37:21 -0000 1.24
+++ kern/subr_taskqueue.c 10 Nov 2004 10:22:41 -0000
@@ -36,9 +36,16 @@
#include <sys/lock.h>
#include <sys/malloc.h>
#include <sys/mutex.h>
+#include <sys/sysctl.h>
#include <sys/taskqueue.h>
+#include <sys/time.h>
#include <sys/unistd.h>
+int tq_in;
+SYSCTL_INT(_kern, OID_AUTO, tq_in, CTLFLAG_RD, &tq_in, 0, "");
+int tq_out;
+SYSCTL_INT(_kern, OID_AUTO, tq_out, CTLFLAG_RD, &tq_out, 0, "");
+
static MALLOC_DEFINE(M_TASKQUEUE, "taskqueue", "Task Queues");
static void *taskqueue_giant_ih;
static void *taskqueue_ih;
@@ -140,6 +147,9 @@
return 0;
}
+ getnanotime(&task->ta_queuetime);
+ tq_in++;
+
/*
* Optimise the case when all tasks have the same priority.
*/
@@ -172,6 +182,7 @@
taskqueue_run(struct taskqueue *queue)
{
struct task *task;
+ struct timespec tv;
int owned, pending;
owned = mtx_owned(&queue->tq_mutex);
@@ -186,8 +197,16 @@
STAILQ_REMOVE_HEAD(&queue->tq_queue, ta_link);
pending = task->ta_pending;
task->ta_pending = 0;
+ tq_out++;
mtx_unlock(&queue->tq_mutex);
+ getnanotime(&tv);
+ timespecsub(&tv, &task->ta_queuetime);
+ if (tv.tv_nsec >= 50000000) {
+ printf("taskqueue_run: warning, queue time of %d.%09ld "
+ "for context %p\n", tv.tv_sec, tv.tv_nsec,
+ task->ta_func);
+ }
task->ta_func(task->ta_context, pending);
mtx_lock(&queue->tq_mutex);
Index: sys/_task.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/_task.h,v
retrieving revision 1.2
diff -u -r1.2 _task.h
--- sys/_task.h 28 Jul 2004 09:12:54 -0000 1.2
+++ sys/_task.h 10 Nov 2004 10:22:19 -0000
@@ -45,6 +45,7 @@
int ta_priority; /* priority of task in queue */
task_fn_t *ta_func; /* task handler */
void *ta_context; /* argument for handler */
+ struct timespec ta_queuetime; /* time enqueued */
};
#endif /* !_SYS__TASK_H_ */
>
> Zoltan
>
> On Wed, 10 Nov 2004, Søren Schmidt wrote:
>
> > Zoltan Frombach wrote:
> > > I just upgraded to 5.3-RELEASE a few days ago. This morning this line
> > > got into my system log file:
> > > Nov 9 06:14:03 www kernel: ad0: WARNING - WRITE_DMA interrupt was seen
> > > but timeout fired LBA=2491143
> > >
> > > I've never seen this message before. Can someone please explain what it
> > > means? With Thanks,
> >
> > It means that the disk has processed the write request (interrupt seen),
> > but that the system (the bio_taskqueue) hasn't been able to get the
> > result returned to the kernel.
> >
> > Your disk is not involved in this problem since it has done its part,
> > but the rest of the system is either busy with something else, or there
> > are bugs lurking that prohibits the bio_taskqueue from running.
> >
> > Either way its a WARNING not a FAILURE :)
>
> I'm still a bit skeptical that the task queue is at fault -- I run my
> notebook with continuous measurement of the latency to schedule tasks,
> generating a warning for any latency > .5 seconds, and the only time I
> ever see that sort of latency is during the boot process when ACPI has
> scheduled a task to run, but the task queue thread has not yet been
> allowed to run:
>
> ipfw2 initialized, divert loadable, rule-based forwarding disabled,
> default to deny, logging disabled
> taskqueue_run: warning, queue time of 0.900011314 for context 0xc045638c
> ad0: 19077MB <FUJITSU MHS2020AT E/8307> [38760/16/63] at ata0-master
> UDMA33
>
> It would be quite interesting to have someone run with this timing code on
> a system reporting the warning to see what they find. What is the
> threshold to generate the ATA timeout warning? My impression was that it
> was several seconds, which is quite a long time. The attached patch
> checks for .5 second or above latency. If the task queue is really
> getting stalled for several seconds, we should figure out what task it is.
> There's KTR tracing in the taskqueue code that we can use to dump a trace
> stream to see what task is taking so long.
>
> Robert N M Watson FreeBSD Core Team, TrustedBSD Projects
> robert at fledge.watson.org Principal Research Scientist, McAfee Research
>
> Index: kern/subr_taskqueue.c
> ===================================================================
> RCS file: /home/ncvs/src/sys/kern/subr_taskqueue.c,v
> retrieving revision 1.25
> diff -u -r1.25 subr_taskqueue.c
> --- kern/subr_taskqueue.c 5 Oct 2004 04:16:00 -0000 1.25
> +++ kern/subr_taskqueue.c 23 Oct 2004 19:18:26 -0000
> @@ -36,9 +36,16 @@
> #include <sys/lock.h>
> #include <sys/malloc.h>
> #include <sys/mutex.h>
> +#include <sys/sysctl.h>
> #include <sys/taskqueue.h>
> +#include <sys/time.h>
> #include <sys/unistd.h>
>
> +int tq_in;
> +SYSCTL_INT(_kern, OID_AUTO, tq_in, CTLFLAG_RD, &tq_in, 0, "");
> +int tq_out;
> +SYSCTL_INT(_kern, OID_AUTO, tq_out, CTLFLAG_RD, &tq_out, 0, "");
> +
> static MALLOC_DEFINE(M_TASKQUEUE, "taskqueue", "Task Queues");
> static void *taskqueue_giant_ih;
> static void *taskqueue_ih;
> @@ -140,6 +147,9 @@
> return 0;
> }
>
> + getnanotime(&task->ta_queuetime);
> + tq_in++;
> +
> /*
> * Optimise the case when all tasks have the same priority.
> */
> @@ -172,6 +182,7 @@
> taskqueue_run(struct taskqueue *queue)
> {
> struct task *task;
> + struct timespec tv;
> int owned, pending;
>
> owned = mtx_owned(&queue->tq_mutex);
> @@ -187,8 +198,16 @@
> pending = task->ta_pending;
> task->ta_pending = 0;
> task->ta_flags |= TAF_PENDING;
> + tq_out++;
> mtx_unlock(&queue->tq_mutex);
>
> + getnanotime(&tv);
> + timespecsub(&tv, &task->ta_queuetime);
> + if (tv.tv_nsec >= 0500000000) {
> + printf("taskqueue_run: warning, queue time of %d.%09ld "
> + "for context %p\n", tv.tv_sec, tv.tv_nsec,
> + task->ta_func);
> + }
> task->ta_func(task->ta_context, pending);
>
> mtx_lock(&queue->tq_mutex);
> Index: sys/_task.h
> ===================================================================
> RCS file: /home/ncvs/src/sys/sys/_task.h,v
> retrieving revision 1.3
> diff -u -r1.3 _task.h
> --- sys/_task.h 5 Oct 2004 04:16:01 -0000 1.3
> +++ sys/_task.h 23 Oct 2004 19:38:16 -0000
> @@ -46,6 +46,7 @@
> task_fn_t *ta_func; /* task handler */
> void *ta_context; /* argument for handler */
> int ta_flags; /* Flags */
> + struct timespec ta_queuetime; /* time enqueued */
> };
>
> #define TAF_PENDING 0x1 /* Task is being run now */
>
More information about the freebsd-current
mailing list