MPSAFE CAM, MPSAFE drivers

Scott Long scottl at samsco.org
Fri Apr 20 09:04:08 UTC 2007


All,

I'm happy to announce that CAM is now MPSAFE, thanks to the help of many
people and sponsorship by Yahoo!  The work is in FreeBSD CVS now and can
be obtained by checking out the HEAD/7-CURRENT branch.  It will be part
of the upcoming FreeBSD 7.0 release this year.  Only the AHC and AHD
drivers are MPSAFE at the moment, but hopefully more will follow in the
coming months.  Below is a document describing the locking approach, and
instructions for locking CAM/SIM drivers that are not yet MPSAFE.

Locking theory
--------------

The following describes the basics of the locking strategy in CAM itself
and how that applies to the SIM drivers (SCSI hardware drivers)
underneath it.  While CAM is MPSAFE, only a few SIMs have been made
MPSAFE so far.  The rest are mostly unchanged and are allowed continue
to operate just as they did before.  I hope that other developers and
interested users will step in and help make these drivers MPSAFE, as
it's too much work for me alone.

Being MPSAFE doesn't necessarily make the CAM subsystem itself faster.
The locking is still fairly monolithic on a per SIM instance level, and
there isn't much parallelism for operations within each instance.
Multiple SIM instances, i.e. multiple buses, do operate almost
completely independently of each other now, so there is full parallelism
there.  However, being MPSAFE does eliminate contention with the other
parts of the OS that are still under Giant, and this is still a huge
win.  Testing moderate to heavy loads on multi-core systems has shown
a significant decrease in contention on the Giant lock, while showing
only minimal new contention on the CAM locks.  This lowered contention
translates into less system time wasted by the CPUs, and thus more
cycles for useful work as well as less latency.

There are now 4 basic locks in CAM, 3 of which are:

xpt_lock - Protects the XPT softc, periph, and SIM instances
xpt_topo_lock - Protects the global peripheral and bus lists
cam_simq_lock - Protects the list of SIMs to be processed in the camisr

These 3 locks are internal to the CAM core and have little bearing on
the operation of SIMs.  None of these locks will be held when calling
into a SIM, and the SIM has no need to access to them either.

The 4th lock is the SIM lock.  This is a non-recursive sleep mutex
(MTX_DEF) that the SIM instance uses to protect its internal data
structures and operations.  It is also exported up to CAM when calling
cam_sim_alloc(), and is used by CAM to protect target, device, and
peripheral objects, as well as SIM and device queues.  Every entry from
CAM into the SIM will be done with this lock held.  The SIM is welcome
to unlock it when it needs, but it must be held when calling back into
most CAM functions.  It is the primary lock for normal I/O flow
throughout CAM starting at the top of the stack in the periph driver.
The flow looks like this:

periph_strategy         sim->mtx
        |                   |
  xpt_schedule              |
        |                   |
  periph_start              |
        |                   |
   xpt_action               |
        |                   |
   sim_action               +



On completion:

     sim_isr             sim->mtx
        |                   |
     xpt_done               |cam_simq_lock
        |                   |
    swi_sched               +



      camisr           cam_simq_lock
        |
  camisr_runqueue        sim->mtx
        |                   |
   periph_done              +


A SIM that is not MPSAFE exports the the Giant mutex (&Giant) in
cam_sim_alloc().  Giant is then treated as a normal mutex by CAM and
is locked and unlocked in the same place as for MPSAFE SIMs.  This does
not put all of CAM back under Giant; multiple SIMs instances can be
registered, some MPSAFE and some not, and CAM will treat the locking of
each instance separately.



Driver changes
--------------

For non-MPSAFE drivers, a single change was made to the API in the
cam_sim_alloc() function.  The function now looks like this:

struct cam_sim *  cam_sim_alloc(sim_action_func sim_action,
                                 sim_poll_func sim_poll,
                                 const char *sim_name,
                                 void *softc,
                                 u_int32_t unit,
                                 struct mtx *mtx,
                                 int max_dev_transactions,
                                 int max_tagged_dev_transactions,
                                 struct cam_devq *queue);

For the "mtx" argument, "&Giant" is used.  Everything else in the
SIM stays the same.  Some structures have also changed sizes, most
notable "cam_sim", but that is not an issue since source level
compatibility is already affected.

MPSAFE drivers must do the following things:

1.  Provide a pointer to a MTX_DEF mutex in cam_sim_alloc().  The mutex
must be allocated and initialized before calling cam_sim_alloc(), and
must not be destroyed until after calling cam_sim_free().  It should not
be held while calling cam_sim_alloc().

2.  The timeout_ch field in the ccb_hdr structure is no longer available
for use by the SIM.  SIMs must now allocate, initialize, and manage
their own callout structures.  All uses of the timeout() API must be
switched to the callout() API.  See the callout manpage for details on
this.

3.  Add the INTR_MPSAFE flag to bus_setup_intr().  This will prevent
Giant from being automatically acquired before the driver interrupt
handler is called.

4.  Any busdma tags that allow load deferrals (i.e. return EINPROGRESS)
must register a non-Giant mutex in bus_dma_tag_create().  This field is
not inherited from parent tags.

5.  If the driver registers a character device with make_dev(), the
D_NEEDSGIANT flag should be dropped, and appropriate locking added to
the device entry vectors.

6.  If the driver registers any sysctls, all locks must be dropped and
Giant must be held explicitly when registering and deregistering the
sysctl nodes.  Sysctl handlers will be called with Giant held, and
appropriate locking should be added under that.  No calls into CAM
should be made from these contexts.

7.  Provide appropriate locking in the interrupt handler as well as any
taskqueue handlers, callout handlers, kthreads, or other detached
contexts, as appropriate.

8.  Ensure that the registered SIM mutex is held when calling all CAM
entry points.  Until recently, the xpt_done() entry point provided its
own locking and did not require Giant to be held.  It still does not
require Giant, but it does require the SIM lock to be held when calling
it.

9.  Do not hold the SIM mutex or any other mutex when calling
malloc(M_WAITOK), bus_dmamem_alloc(), and bus_dmamap_create().

10. Any uses of tsleep must be changed to msleep.

For multi-function PCI devices where each function represents a bus, a
separate SIM and SIM mutex should be allocated and managed for each
function.  Functions that register multiple SIMs should coordinate
locking between those SIMs as needed; the same lock can be registered
for these separate SIMs, at the cost of reduced parallelism between
SIMs.  Functions that register a single SIM for multiple buses will have
all of those buses under a single mutex as far as CAM is concerned.

The simplest strategy is to use a single lock per SIM instance.  More
complex multi-level or pipelined locking is allowed; the registered SIM
lock can be dropped by the SIM at any point without disrupting the rest
of CAM, so long as no CAM entry points are called with it unlocked.
This will be an area for further research.


Userland changes
----------------

Efforts were made to keep the userland API and ABI unchanged.  Thus,
there are no source level changes needed for any tools, libraries, or
apps, nor any need to recompile any of these either.


Future work
-----------

The CAM API will likely undergo some more small changes to support
future work with newbus integration and SAS/SATA/FC transport
modularization.  These changes will hopefully be done before FreeBSD 7.0
is released.



More information about the freebsd-scsi mailing list