kernel panic with firewire PCI card

Sat Mar 28 14:41:42 PDT 2009

On Fri, Mar 27, 2009 at 09:40:23PM +0100, Andreas Tobler wrote:
> Hello,
> 
> I get the below panic while having plugged in a firewire PCI card. The 
> card itself is not 'sun' compliant. Means, it is a custom fw card made 
> for PC's I guess. But as far as I understand, fbsd can handle non sun 
> PCI cards, can't it?

If the driver is well-written, i.e. correctly uses bus_dma(9)
and bus_space(9), is LP64- and endian-clean and deals with
strict alignment requirements, and the chip plays nice
with Sun's interpretation of the PCI specifications they
should work. Drivers that rely on firmware on cards to do
some of the initialization still won't work though if it's
"PC firmware". Typical examples of the latter are ATA and
graphics controllers. In theory there are some ways one
can make them work anyway but that would be quite some
work to do if at all feasible.

> 
> The os itself is current as of yesterday, the kernel rev you see below.
> 
> Is there anything more I can provide to debug this issue?
> 
> Btw, the machine is a ultra60.
> 

<...>

> fwohci0: <Texas Instruments TSB12LV23> mem 
> 0x4008000-0x40087ff,0x400c000-0x400ff
> ff at device 4.0 on pci0
> fwohci0: [ITHREAD]
> fwohci0: OHCI version 1.0 (ROM=1)
> fwohci0: No. of Isochronous channels is 4.
> fwohci0: EUI64 00:10:74:60:00:00:ee:a9
> panic: pcib: PCI bus B error AFAR 0x1ff840080ec AFSR 0x4000f00000000000
> cpuid = 0
> KDB: enter: panic
> [thread pid 0 tid 100000 ]
> Stopped at      kdb_enter+0x80: ta              %xcc, 1
> db> bt
> Tracing pid 0 tid 100000 td 0xc08ad870
> panic() at panic+0x20c
> psycho_pci_bus() at psycho_pci_bus+0x88
> intr_event_handle() at intr_event_handle+0x5c
> intr_execute_handlers() at intr_execute_handlers+0x14
> intr_fast() at intr_fast+0x68
> -- interrupt level=0xd pil=0 %o7=0xc0659be4 --
> -- data access error %o7=0x32a --
> fwphy_rddata() at fwphy_rddata+0xe8
> fwohci_reset() at fwohci_reset+0x298
> fwohci_init() at fwohci_init+0x9f8
> fwohci_pci_attach() at fwohci_pci_attach+0x278
> device_attach() at device_attach+0x4a4

PCI AFSR 0x4000000000000000 indicates that the primary error
was a target abort. Given that no DMA is involved at this stage
this means it actually was the OHCI chip which complained
about the PIO access. If this is the first access after the
reset (check with "l *(0xc0659be4)", "l *(fwphy_rddata+0xe8)"
and "l *(fwohci_reset+0x298)" in gdb on the corresponding
kernel.debug what code is actually involved) I'd suspect
the problem to be a combination of a sloppy driver with a
chip that takes some more time than the other contenders
to get ready again after a reset, i.e. fwohci_reset() only
tries 100 times with waiting one millisecond between tries
for OHCI_HCC_RESET to clear after the reset (the latter part
is in line with the OHCI specification). Increasing to f.e.
1000 tries should solve the panic then, if this is actually
the cause. Generally fwohci(4) should be changed to fail if
the chip doesn't become ready again after a reset instead
of just ignoring that problem though. At least fwohci_reset()
(there are probably more such functions in fwohci(4)) also
seems to miss some bus space barriers, which also could be
the cause of this panic.

Marius