CAM locking question
chuck at tuffli.net
Fri Dec 9 01:34:31 UTC 2011
I've been debugging a hang and am wondering if this might be a CAM
problem. The setup is 8-stable, a FC initiator I'm developing, and fio
using the POSIX aio engine with a queue depth > 1 (i.e. a bunch of
concurrent IO). Note that this setup with a queue depth of 1 runs
The symptom is fio gets stuck in aio_suspend() waiting for submitted
IOs to complete. But I've verified the driver has already completed
the IOs in question.
Playing around with DTrace, it appears that camisr_runqueue() is
running at the same time the driver is completing CCBs with xpt_done()
albeit on different processors. Staring at the code seems to indicate
that camisr_runqueue() is running inside CAM_SIM_LOCK() while
manipulating the sim_doneq list, but it looks like xpt_done() could
potentially be touching the same sim_doneq without locking.
As an experiment, I added CAM_SIM_LOCK/CAM_SIM_UNLOCK around the
TAILQ_INSERT_TAIL() in xpt_done(), and what was a reliable hang after
a minute or two hasn't shown up in an hour.
I'm not sure this is the right fix, but I wanted to run the scenario
by the experts to get some feedback. Thoughts?
More information about the freebsd-scsi