PCI range checking under qemu-system-sparc64

Mark Cave-Ayland mark.cave-ayland at ilande.co.uk
Sat Sep 19 23:05:45 UTC 2015

On 19/09/15 22:14, Marius Strobl wrote:

> On Fri, Sep 18, 2015 at 07:59:46AM +0100, Mark Cave-Ayland wrote:
>> On 17/09/15 09:28, Alexey Dokuchaev wrote:
>>> On Wed, Sep 16, 2015 at 11:19:15PM +0200, Marius Strobl wrote:
>>>> [...]
>>>> Which suggest that the next thing to investigate is the CMD646
>>>> emulation. Is there a particular reason why QEMU emulates a
>>>> CMD646U rather than a plain CMD646 as found in the real sun4u
>>>> machines of the USIIe/i era?
>>>> Alexey, does building the port with CDROM_DMA disabled make
>>>> a difference?
>>> Ironically I had it already disabled prior to your question; but I've
>>> rebuilt the port enabling it for completeness' sake.  It did not make
>>> a difference.
>>> Then I've disabled all CAM/ATA stuff (scbus, ata, umass, etc.) in the
>>> kernel config and that's what I see now (this is with CDROM_DMA=on):
>> What does the CAM/ATA stuff do here? Does this mean it may not
>> necessarily be an interrupt issue if you can get to mounting the root fs
>> with CDROM_DMA=on?
> I think we are looking at multiple issues here. First off, based
> on the interrupt-map provided by OpenBIOS, we route the intpin of
> the ATA controller to INO 20 (which uses the interrupt mapping
> register at offset 0xc28). If I additionally enable the code for
> debugging interrupt routing problems, which just clears all
> interrupts handle by the Sabre and then in all its interrupt
> mapping registers enables all INOs by writing the valid bit,
> Sabre IGN and CPU module ID there, I see 5 interrupts for INO
> 20 at the time the kernel hangs under QEMU. I don't get any
> stray interrupts for INOs that we're not actively routing.
> Based on what Alexey wrote, he doesn't see interrupts for INO 20,
> i. e. the ATA controller, with a kernel not including the debug
> code. That doesn't make sense to me. Without the debug code, we
> still enable the INOs for which drivers request them when they
> attach to devices the same way as described above, just not
> unconditionally for all interrupts handle by the Sabre. So
> there should be no difference for correctly routed interrupts.
> Second, even when I see interrupts for the ATA controller,
> enumeration of storage devices hangs somehow. On a real
> machine when booting verbose, this looks like (mainly output
> from ata_generic_reset():
> <...>
> IPsec: Initialized Security Association Processing.
> lo0: bpf attached
> ata2: reset tp1 mask=00 ostat0=ff ostat1=ff
> ata3: reset tp1 mask=03 ostat0=50 ostat1=00
> ata3: stat0=0x80 err=0x80 lsb=0x80 msb=0x80
> ata3: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
> ata3: stat1=0x00 err=0x01 lsb=0x14 msb=0xeb
> ata3: reset tp2 stat0=50 stat1=00 devices=0x20001
> GEOM: new disk cd0
> cd0 at ata3 bus 0 scbus1 target 1 lun 0
> cd0: <TEAC CD-224E 1.7A> Removable CD-ROM SCSI device
> cd0: 33.300MB/s transfers (UDMA2, ATAPI 12bytes, PIO 65534bytes)
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> <...>
> With QEMU:
> <...>
> IPsec: Initialized Security Association Processing.
> lo0: bpf attached
> ata2: reset tp1 mask=03 ostat0=00 ostat1=00
> ata2: stat0=0x00 err=0x00 lsb=0x00 msb=0x00
> ata2: stat1=0x00 err=0x00 lsb=0x00 msb=0x00
> ata2: reset tp2 stat0=00 stat1=00 devices=0x0
> ata3: reset tp1 mask=03 ostat0=50 ostat1=00
> ata3: stat0=0x00 err=0x01 lsb=0x14 msb=0xeb
> ata3: stat1=0x00 err=0x00 lsb=0xff msb=0xff
> ata3: reset tp2 stat0=00 stat1=00 devices=0x10000
> <hang>
> <...>
> db> show intrcnt
> pil3: ithrd             5
> vec2004: atapci0        5
> QEMU: Terminated
> <...>
> So after a reset, real iron additionally sets the READY and
> SERVICE bits in the ATA status register but that doesn't
> explain the hang. Code that is waiting for READY to get set
> also uses a timeout.
> I patched QEMU to identify the ATA controller as a plain
> CMD646 one so the UDMA isn't tried in the first place. I
> also limited the ATA mode used by the kernel to PIO4 at
> maximum. Neither made a difference, i. e. the hang still
> occurs.
> Another striking thing is that the interrupt statistics
> show to CPU tick interrupts. I'm unsure at which point
> the kernel actually enables them but they really should
> have been engaged at the time the hang occurs, otherwise
> scheduling won't work properly, which also might be an
> explanation for the hang, i. e. the kernel might never
> switch to the CAM thread(s).
> All in all, interrupts in QEMU seem buggy in one way
> or another.

Thanks for looking into this in detail Marius - plenty of information to
start debugging this further.

While I don't have any insight on the CPU tick interrupt yet, my initial
feeling is that the ATA hang could be related to the PCI interrupt
clearing issue that I started looking into a while back. Although it
isn't a complete fix, does the attached patch against QEMU help at all?
Otherwise it will require a deeper dive into the QEMU interrupt emulation.



-------------- next part --------------
A non-text attachment was scrubbed...
Name: apb-no-clear.patch
Type: text/x-diff
Size: 1128 bytes
Desc: not available
URL: <http://lists.freebsd.org/pipermail/freebsd-sparc64/attachments/20150920/d131a1d8/attachment.bin>

More information about the freebsd-sparc64 mailing list