memory barriers in bus_dmamap_sync() ?
scottl at samsco.org
Wed Jan 11 18:38:52 UTC 2012
On Jan 11, 2012, at 10:10 AM, Ian Lepore wrote:
> On Wed, 2012-01-11 at 09:59 -0700, Scott Long wrote:
>> Where barriers _are_ needed is in interrupt handlers, and I can
>> discuss that if you're interested.
> I'd be interested in hearing about that (and in general I'm loving the
> details coming out in your explanations -- thanks!).
> -- Ian
Well, I unfortunately wasn't as clear as I should have been. Interrupt handlers need bus barriers, not cpu cache/instruction barriers. This is because the interrupt signal can arrive at the CPU before data and control words are finished being DMA's up from the controller. Also, many controllers require an acknowledgement write to be performed before leaving the interrupt handler, so the driver needs to do a bus barrier to ensure that the write flushes. But these are two different topics, so let me start with the interrupt handler.
Legacy interrupts in PCI are carried on discrete pins and are level triggered. When the device wants to signal an interrupt, it asserts the pin. That assertion is seen at the IOAPIC on the host bridge and converted to an interrupt message, which is then sent immediately to the CPU's lAPIC. This all happened very, very quickly. Meanwhile, the interrupt condition could have been predicated on the device DMA'ing bytes up to host memory, and those DMA writes could have gotten stalled and buffered on the way up the PCI topology. The end result is often that the driver interrupt handler runs before those writes have hit host memory. To fix this, drivers do a read of a card register as the first step in the interrupt handler, even if the read is just a dummy and the result is thrown away. Thanks to PCI ordering, the read will ensure that any pending writes from the card have flushed all the way up, and everything will be coherent by the time the read completes.
MSI and MSIX interrupts on modern PCI and PCIe fix this. These interrupts are sent as byte messages that are DMA'd to the host bridge. Since they are in-band data, they are subject to the same ordering rules as all other data on the bus, and thus ordering for them is implicit. When the MSI message reaches the host bridge, it's converted into an lAPIC message just like before. However, the driver doesn't need to do a flushing read because it knows that the MSI message was the last write on the bus, therefore everything prior to it has arrived and everything is coherent. Since reads are expensive in PCI, this saves a considerable amount of time in the driver. Unfortunately, it adds non-deterministic latency to the interrupt since the MSI message is in-band and has no way to force priority flushing on a busy bus. So while MSI/MSIX save some time in the interrupt handler, they actually make the overall latency situation potentially worse (thanks Intel!).
The acknowledgement write issue is a little more straight forward. If the card requires an acknowledgment write from the driver to know that the interrupt has been serviced (so that it'll then know to de-assert the interrupt line), that write has to be flushed to the hardware before the interrupt handler completes. Otherwise, the write could get stalled, the interrupt remain asserted, and in the interrupt erroneously re-trigger on the host CPU. I've seen cases where this devolves into the card getting out of sync with the driver to the point that interrupts get missed. Also, this gets a little weird sometimes with buggy MSI hacks in both device and PCI bridge hardware.
More information about the freebsd-current