Heap overflow in mps(4) (was: Re: stable/9 mps(4) rev 254938 == BOOM!)

Garrett Wollman wollman at csail.mit.edu
Mon Feb 3 22:00:43 UTC 2014


<<On Thu, 30 Jan 2014 17:33:42 -0700, "Kenneth D. Merry" <ken at freebsd.org> said:

> The attached patch should fix the leaked allocations.  I'm CCing Steve and
> Kashyap at LSI so that they can verify that this is the right place to do
> the mapping shutdown.

It does fix the leak.

> I don't know yet why that particular change is causing problems.  Perhaps
> it just moved things around and exposed an existing problem.

> The fact that the redzone code doesn't expose any problems makes it more
> likely that it is a problem other than a heap overflow.

> Since it is consistent, is there any chance you could hook up remote gdb to
> the box and poke around when it crashes?  Perhaps you'll see something
> interesting that will point to the problem.

No way to do a remote GDB, unfortunately.  However, I tried a few
other things:

- It makes no difference whether mps.ko is preloaded or loaded in
single-user mode.

- If I boot a kernel/modules without redzone, loading mps.ko
instapanics, in a very different place (apologies for the poor
transcription; I can either be up in the machine room to plug in USB
sticks or use the serial console, not both):

--- trap 0xc, rip = 0xffff....f807e934a, rsp = 0xff...94da4c48f0, rbp = 0xff...94da4c4950 ---
bzero() at bzero+0xa/frame 0xff...94da4c4af0
mpssas_add_device() at mpssas_add_device+0x78/frame 0xff..94da4c4af0
mpssas_firmware_event_work() at mpssas_firmware_event_work+0x437/frame 0xff....94da4c4b78
taskqueue_run_locked() at taskqueue_run_locked+0x74/frame 0xff..94da4c4bc0
taskqueue_thread_loop() at taskqueue_thread_loop+0x46/frame 0xff..94da4c4be0

Inspection of the code does not reveal any arc from mpssas_add_device
to bzero.  The return address in the frame is the location of the
first function call (to mpssas_startup_increment()) in
mpssas_add_device().

So I think it's fair to say that something is scribbling over memory
in quite a bad way.

Two things that may be relevant: on boot, this server's MPT2 BIOS
always complains "adapter configuration may have changed", and I
haven't discovered anything in the configuration utility that changes
this.  Also, on boot, I always get the following messages:

failure at /usr/src-9-stable/sys/dev/mps/mps_sas_lsi.c:667/mpssas_add_device()! Could not get ID for device with handle 0x0010
mpssas_fw_work: failed to add device with handle 0x10

This has been true across mps(4) revisions, on all three copies of
this hardware that I have in service.

-GAWollman


More information about the freebsd-stable mailing list