cvs commit: src/sys/compat/ndis kern_ndis.c ndis_var.h subr_ndis.c subr_ntoskrnl.c src/sys/dev/if_ndis if_ndis.c

Bill Paul wpaul at FreeBSD.org
Sun Mar 27 02:14:37 PST 2005


wpaul       2005-03-27 10:14:36 UTC

  FreeBSD src repository

  Modified files:
    sys/compat/ndis      kern_ndis.c ndis_var.h subr_ndis.c 
                         subr_ntoskrnl.c 
    sys/dev/if_ndis      if_ndis.c 
  Log:
  Finally bring an end to the great "make the Atheros NDIS driver
  work on SMP" saga. After several weeks and much gnashing of teeth,
  I have finally tracked down all the problems, despite their best
  efforts to confound and annoy me.
  
  Problem nunmber one: the Atheros windows driver is _NOT_ a de-serialized
  miniport! It used to be that NDIS drivers relied on the NDIS library
  itself for all their locking and serialization needs. Transmit packet
  queues were all handled internally by NDIS, and all calls to
  MiniportXXX() routines were guaranteed to be appropriately serialized.
  This proved to be a performance problem however, and Microsoft
  introduced de-serialized miniports with the NDIS 5.x spec. Microsoft
  still supports serialized miniports, but recommends that all new drivers
  written for Windows XP and later be deserialized. Apparently Atheros
  wasn't listening when they said this.
  
  This means (among other things) that we have to serialize calls to
  MiniportSendPackets(). We also have to serialize calls to MiniportTimer()
  that are triggered via the NdisMInitializeTimer() routine. It finally
  dawned on me why NdisMInitializeTimer() takes a special
  NDIS_MINIPORT_TIMER structure and a pointer to the miniport block:
  the timer callback must be serialized, and it's only by saving the
  miniport block handle that we can get access to the serialization
  lock during the timer callback.
  
  Problem number two: haunted hardware. The thing that was _really_
  driving me absolutely bonkers for the longest time is that, for some
  reason I couldn't understand, my test machine would occasionally freeze
  or more frustratingly, reset completely. That's reset and in *pow!*
  back to the BIOS startup. No panic, no crashdump, just a reset. This
  appeared to happen most often when MiniportReset() was called. (As
  to why MiniportReset() was being called, see problem three below.)
  I thought maybe I had created some sort of horrible deadlock
  condition in the process of adding the serialization, but after three
  weeks, at least 6 different locking implementations and heroic efforts
  to debug the spinlock code, the machine still kept resetting. Finally,
  I started single stepping through the MiniportReset() routine in
  the driver using the kernel debugger, and this ultimately led me to
  the source of the problem.
  
  One of the last things the Atheros MiniportReset() routine does is
  call NdisReadPciSlotInformation() several times to inspect a portion
  of the device's PCI config space. It reads the same chunk of config
  space repeatedly, in rapid succession. Presumeably, it's polling
  the hardware for some sort of event. The reset occurs partway through
  this process. I discovered that when I single-stepped through this
  portion of the routine, the reset didn't occur. So I inserted a 1
  microsecond delay into the read loop in NdisReadPciSlotInformation().
  Suddenly, the reset was gone!!
  
  I'm still very puzzled by the whole thing. What I suspect is happening
  is that reading the PCI config space so quickly is causing a severe
  PCI bus error. My test system is a Sun w2100z dual Opteron system,
  and the NIC is a miniPCI card mounted in a miniPCI-to-PCI carrier card,
  plugged into a 100Mhz PCI slot. It's possible that this combination of
  hardware causes a bus protocol violation in this scenario which leads
  to a fatal machine check. This is pure speculation though. Really all I
  know for sure is that inserting the delay makes the problem go away.
  (To quote Homer Simpson: "I don't know how it works, but fire makes
  it good!")
  
  Problem number three: NdisAllocatePacket() needs to make sure to
  initialize the npp_validcounts field in the 'private' section of
  the NDIS_PACKET structure. The reason if_ndis was calling the
  MiniportReset() routine in the first place is that packet transmits
  were sometimes hanging. When sending a packet, an NDIS driver will
  call NdisQueryPacket() to learn how many physical buffers the packet
  resides in. NdisQueryPacket() is actually a macro, which traverses
  the NDIS_BUFFER list attached to the NDIS_PACKET and stashes some
  of the results in the 'private' section of the NDIS_PACKET. It also
  sets the npp_validcounts field to TRUE To indicate that the results are
  now valid. The problem is, now that if_ndis creates a pool of transmit
  packets via NdisAllocatePacketPool(), it's important that each time
  a new packet is allocated via NdisAllocatePacket() that validcounts
  be initialized to FALSE. If it isn't, and a previously transmitted
  NDIS_PACKET is pulled out of the pool, it may contain stale data
  from a previous transmission which won't get updated by NdisQueryPacket().
  This would cause the driver to miscompute the number of fragments
  for a given packet, and botch the transmission.
  
  Fixing these three problems seems to make the Atheros driver happy
  on SMP, which hopefully means other serialized miniports will be
  happy too.
  
  And there was much rejoicing.
  
  Other stuff fixed along the way:
  
  - Modified ndis_thsuspend() to take a mutex as an argument. This
    allows KeWaitForSingleObject() and KeWaitForMultipleObjects() to
    avoid any possible race conditions with other routines that
    use the dispatcher lock.
  
  - Fixed KeCancelTimer() so that it returns the correct value for
    'pending' according to the Microsoft documentation
  
  - Modfied NdisGetSystemUpTime() to use ticks and hz rather than
    calling nanouptime(). Also added comment that this routine wraps
    after 49.7 days.
  
  - Added macros for KeAcquireSpinLock()/KeReleaseSpinLock() to hide
    all the MSCALL() goop.
  
  - For x86, KeAcquireSpinLockRaiseToDpc() needs to be a separate
    function. This is because it's supposed to be _stdcall on the x86
    arch, whereas KeAcquireSpinLock() is supposed to be _fastcall.
    On amd64, all routines use the same calling convention so we can
    just map KeAcquireSpinLockRaiseToDpc() directly to KfAcquireSpinLock()
    and it will work. (The _fastcall attribute is a no-op on amd64.)
  
  - Implement and use IoInitializeDpcRequest() and IoRequestDpc() (they're
    just macros) and use them for interrupt handling. This allows us to
    move the ndis_intrtask() routine from if_ndis.c to kern_ndis.c.
  
  - Fix the MmInitializeMdl() macro so that is uses sizeof(vm_offset_t)
    when computing mdl_size instead of uint32_t, so that it matches the
    MmSizeOfMdl() routine.
  
  - Change a could of M_WAITOKs to M_NOWAITs in the unicode routines in
    subr_ndis.c.
  
  - Use the dispatcher lock a little more consistently in subr_ntoskrnl.c.
  
  - Get rid of the "wait for link event" hack in ndis_init(). Now that
    I fixed NdisReadPciSlotInformation(), it seems I don't need it anymore.
    This should fix the witness panic a couple of people have reported.
  
  - Use MSCALL1() when calling the MiniportHangCheck() function in
    ndis_ticktask(). I accidentally missed this one when adding the
    wrapping for amd64.
  
  Revision  Changes    Path
  1.72      +97 -60    src/sys/compat/ndis/kern_ndis.c
  1.36      +52 -3     src/sys/compat/ndis/ndis_var.h
  1.77      +81 -23    src/sys/compat/ndis/subr_ndis.c
  1.58      +76 -30    src/sys/compat/ndis/subr_ntoskrnl.c
  1.84      +59 -67    src/sys/dev/if_ndis/if_ndis.c


More information about the cvs-all mailing list