sparc64/141918: [ehci] ehci_interrupt: unrecoverable error,
controller halted (sparc64)
Manuel Tobias Schiller
mala at hinterbergen.de
Tue Apr 24 12:10:15 UTC 2012
The following reply was made to PR sparc64/141918; it has been noted by GNATS.
From: Manuel Tobias Schiller <mala at hinterbergen.de>
To: Marius Strobl <marius at alchemy.franken.de>
Cc: bug-followup at FreeBSD.org
Subject: Re: sparc64/141918: [ehci] ehci_interrupt: unrecoverable error,
controller halted (sparc64)
Date: Tue, 24 Apr 2012 14:05:47 +0200
--Sig_/9VY0Wp.oK7i1jICn3i=InqH
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Hi Marius,
I'm rather busy with work at the moment, so I'm not working quite as much
on troubleshooting this issue right now... (See below for answers to your
questions...)
On Sun, 15 Apr 2012 14:51:05 +0200
Marius Strobl <marius at alchemy.franken.de> wrote:
> [...]
> > > >=20
> > > > Hi,
> > > >=20
> > > > the "VIA quirk fix" on its own gives the familiar message in dmesg
> > > > (unrecoverable error, controller halted), so I'm compiling a
> > > > kernel which
> > >=20
> > > Oof, this likely means there's a more basic problem with this
> > > device. Have you already tried to re-seat the card in case there's
> > > an electrical problem?
> > > Please also provide the output of `pciconf -rb ehci0 at pci0:2:5:2
> > > 0:255' from a booting kernel.
> > > FYI, after some digging I've found the following card
> > > ehci0 at pci0:2:5:2: class=3D0x0c0320 card=3D0x31041106 chip=3D0x31041106
> > > rev=3D0x6h0 which is a newer revision of your device and works just
> > > fine in a T1-200 including with the usb(4) fixes. The publicly
> > > available datasheets for the VIA USB controllers are minimal and
> > > exclude errata and Linux also doesn't seem to use any additional
> > > work arounds, so I'm starting to run out of ideas what could be
> > > wrong with your revision. The only remaining thing to give a try I
> > > currently can think of is to test whether it chokes on the generic
> > > initialization done by the sparc64 PCI code using the attached
> > > patch.
> > >=20
> > > > combines this fix with your latest busdma fix to try them both
> > > > together;
> > >=20
> > > This combination is unlikely to make a difference.
> > >=20
> > > Marius
> > >=20
> >=20
> > Hi Marius,
> >=20
> > I've tried your new patch, both on its own and in conjunction with
> > the latest busdma and Via quirk fixes, and I still get the same error
> > message...
> >=20
> > Here's the output of pciconf you requested:
> >=20
> > mala at router:~> sudo pciconf -rb ehci0 at pci0:2:5:2 0:255
> > Password:
> > 06 11 04 31 06 00 10 22 65 20 03 0c 00 16 80 00=20
> > 00 a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 00 00 00 00 00 00 00 00 00 00 00 00 06 11 04 31=20
> > 00 00 00 00 80 00 00 00 00 00 00 00 14 03 00 00=20
> > 00 00 0b 00 00 00 00 00 a0 20 00 29 00 00 ff ff=20
>=20
> This is rather confusing; the 0x29 in the above line means that the
> VIA workaround is applied. Didn't you say that with the fix to
> actually apply it, the kernel panics as soon as attaching the
> device?
> Apart from this, the configuration space differs in 3 undocumented
> bytes from mine. I'm not sure whether it's worth trying whether
> these make a difference ...
Yes, this was from a kernel with your patch and the VIA workaround
applied; the kernel usually stops when I start using these devices
heavily (i.e. the automatic checks done during a ZFS mount operation).
> > 00 5a 04 80 00 00 00 00 04 0b 88 88 33 00 00 00=20
> > 20 20 01 00 00 00 00 00 01 00 00 00 00 00 00 c0=20
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 01 00 0a 7e 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00=20
> > 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00
> >=20
> > This was taken after the controller stopped, on a kernel with your
> > latest patch, but I'd guess that doesn't matter - the EHCI driver
> > should not be playing with the PCI settings after initialisation...
> >=20
> > I've also opened the machine, and the PCI card is seated properly. I
> > even removed it and tried an even older VIA EHCI controller and one
> > of the first USB 2.0 controllers by NEC - no luck, the VIA one had
> > trouble recognizing devices, the NEC one did not recognize a single
> > one I plugged in.
> >=20
>=20
> This also is rather strange. Have you ever used any other type of
> card in the slot, f.e. an NIC, so you can rule out it's broken
> somehow?
Some four or five years ago, the slot held a quad fast ethernet NIC, and
that seemed to work fine... But: a lot can happen during this time, so I
ordered a new USB controller to test with, just in case...
> How does using the on-board USB controller work out?
As far as I know, the on-board controller is USB1.1, so I have not really
tried it because it's going to be a no-go option for disks (I'd get
similar speed getting data from some server here at CERN over my DSL
connection, and I probably wouldn't even have to administer the server
myself - if I could get them to host my data ;)... I can give the onboard
USB 1.1 controller a try, though...
I noticed something else when reconnecting everything to the server: The
USB ground seems to have a quite high (voltage) potential with respect to
the chassis of the server (and the protective ground of the wall outlet),
about 80 Volts. I've tried to locate a single faulty power supply of the
hard disks (since the server chassis is at ground levels), but when
tested individually, none of them shows this behaviour. It only happens
when I connect all eight USB disks to the USB hub which in turn connects
to the server. Apparently, this is some collective effect. Obviously, when
the USB cable from the hub is plugged into the server, this potential
difference is no longer there, and the disks are recognised.
I'm not sure what this observation means (except that I'd really prefer
linear over switching mode power supplies because of the galvanic
separation between primary and secondary sides), but I thought I
mention it anyway.
Manuel
> Marius
>=20
>=20
--=20
Homepage: http://www.hinterbergen.de/mala
OpenPGP: 0xA330353E (DSA) or 0xD87D188C (RSA)
--Sig_/9VY0Wp.oK7i1jICn3i=InqH
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
iQGcBAEBAgAGBQJPlpcbAAoJEEPbVOqHHK4gOYcL/1RX9RFO/1igVUFYkXiZJJg/
ctFL8SmAWDPofWO4xCoHzCeLVG1nj5dkn0QdMB93t+JRq8mhH+Dyv+VPgbv94dea
uYdRr2fjQktRJptkLFtMTvK7NyxItQ6PNSBEVkIYJrbo7/cqumeF1hJ7ZB255Iub
gHdR4zQQv/0PiwFeBSdjFK1RHMAcp/0LnzWiBW/xKeKEE4U7YzNt+5Xo1c0ym5me
1FNl403xtgttlUzAK3pQqh54dWJbtyFpz489eRY92+ZydGuT3XtDf6svqoyUGx2K
2q5Kq72MaTmSittwPeV5UxfqI45Iz6PUha2R3P9GHc75CVY7vN9wF+M3/qIwAToB
H75vI7KF1ZUM8HR2OX9MnWCsaJiNsHKqyDgitjI7O1IRDeXVcgVnzQVtez3ZKTHN
aoid3ItzMK0Sh6HBSktNl5CvTCwH7sPcdfpCp4OybANFb6UDeZhrW8XBrAoV8mx3
9nOfiAVjsLsPpDq423BvanI9s8xd72OhbcgxKAoYAQ==
=Hc4a
-----END PGP SIGNATURE-----
--Sig_/9VY0Wp.oK7i1jICn3i=InqH--
More information about the freebsd-sparc64
mailing list