Patch for Alladin dallas (ALi) AGP kernel panic [long]

Andrea Cocito andrea.cocito at ieo-research.it
Tue Aug 5 03:22:31 PDT 2003


Hallo,

first of all: I am completely new to both FreeBSD kernel internals and 
to
PC-Intel hardware stuff, and the last time I put my hands on a *BSD 
kernel
was a few years ago, so I might be wrong.... but still: it works and for
sure fixed an existing bug.

My machine did not like to go beyond 4.7. Kernel panic after AGP 
probing,
trying to contigmalloc1() with size = 0; the problem seem to be shared 
by
several machines based on the ALi chipset. Compiling a custom kernel
without the "agp" device worked.

Looking into the code and several panic outputs I found these issues:
- I am almost sure that the box here does not have an AGP bus at all
   (it has an onboard video, maybe an internal AGP bus is there but there
   is not a connector for sure). 4.7 did not show any agp* device at 
boot,
   4.8 and 5.* without agp* just see the vga* on isa* and work.
   So *maybe* someone could investigate if the device is really an AGP
   bus... (or give me a clue on how to check it).
- If it is an agp bus then for some reason it reports an aperture size
   of zero, does this make any sense ? Again: my knowledge of agp stuff
   is NULL. Could we leave the device attached without allocating 
anything
   there ?
- In /src/sys/pci/agp_ali.c and others there is a loop that supposedly
   tries to alloc smaller apertures until it either fails reaching zero
   size or manages to allocate something: no matter what the two points
   above result to be the thing is just broken (as is will either alloc
   the requested size at the first shot, crash if it was zero, or fail if
   it ever tries to reduce the aperture), this is what I patched.

The attached patch fixes the said loop to make it do something 
meaningful
(try to malloc a progressively smaller aperture, until it reaches zero 
or
succeeds, if it fails detaches the device and returns ENOMEM).

As said: whatever the real origin of the problem was this thing was just
broken and now does what the original code probably intended to do so 
that
at least now the system boots.

Maybe this helps also others having troubles with some Toshiba laptops
using that chipset and that reported the same panic on several lists.

Pasted down here the patch -rc3 for pci_ali.c. The same loop should be
fixed on several other pci_*.c sources, let me know what patch style is
preferred and if the list supports attachments and I'll be glad to send
the complete  diff for all pertinent files.

Also let me know if returning ENOMEM when we are requested to have an
aperture size of 0 is ok or I should better have it return EINVAL (this
option looks better to me).

Then is up to someone knowing a bit better the hardware and kernel 
internals
to work on the real solution (understand if we fail the probe and this 
is
not really an agp bus, or find a way to know correctly the aperture 
size).

Ciao,

A.

========= CUT HERE ==========
*** agp_ali.c.unpatched Mon Aug  4 09:25:13 2003
--- agp_ali.c   Mon Aug  4 12:13:44 2003
***************
*** 101,121 ****
                 return error;

         sc->initial_aperture = AGP_GET_APERTURE(dev);

!       for (;;) {
                 gatt = agp_alloc_gatt(dev);
                 if (gatt)
                         break;
!
!               /*
!                * Probably contigmalloc failure. Try reducing the
!                * aperture so that the gatt size reduces.
!                */
!               if (AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2)) {
                         agp_generic_detach(dev);
                         return ENOMEM;
-               }
         }
         sc->gatt = gatt;

         /* Install the gatt. */
--- 101,120 ----
                 return error;

         sc->initial_aperture = AGP_GET_APERTURE(dev);
+       gatt = NULL;

!       while (AGP_GET_APERTURE(dev) != 0) {
                 gatt = agp_alloc_gatt(dev);
                 if (gatt)
                         break;
!               AGP_SET_APERTURE(dev, AGP_GET_APERTURE(dev) / 2);
!       }
!
!       if (!gatt) {
                         agp_generic_detach(dev);
                         return ENOMEM;
         }
+
         sc->gatt = gatt;

         /* Install the gatt. */
========= CUT HERE ==========

PS: cc: me on the replies please, I'm not on the list, thanks.

----------
Andrea Cocito < andrea.cocito at ieo-research.it >
IEO -- European Institute of Oncology - Bioinformatics group
tel: +39 02 57489 857   fax: +39 02 57489 851

"Imagination is more important than knowledge"
    -Albert Einstein



More information about the freebsd-hackers mailing list