RELENG_8: panic: wrong offset 4096 for sectorsize 2352

Tue May 24 21:27:27 UTC 2011

Am 24.05.2011 10:53, schrieb Jeremy Chadwick:
> On Tue, May 24, 2011 at 09:26:18AM +0200, Joerg Wunsch wrote:
>> As Andriy Gapon wrote:
>>
>>>> panic: wrong offset 4096 for sectorsize 2352
>>>>
>>>> Any ideas why this happens, and how to avoid it?
>>
>>> Backtrace would be a first thing.
>>
>> OK, here we go (the core has been dumped from within a serial console
>> BREAK DDB entry, I'm omitting the frames related to that):
>>
>> #16 0xc0537352 in _cv_wait (cvp=0xc6e6bcd4, lock=0xc6e6bdd4) at /usr/src/sys/kern/kern_condvar.c:96
>> #17 0xc0aa8a13 in usb_process (arg=0xc6e6bccc)
>>     at /usr/src/sys/modules/usb/usb/../../../dev/usb/usb_process.c:183
>> #18 0xc054f948 in fork_exit (callout=0xc0aa88e0 <usb_process>, arg=0xc6e6bccc, frame=0xc6a1ad28)
>>     at /usr/src/sys/kern/kern_fork.c:865
>> #19 0xc077fd34 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:275
>>
>> After the initial panic, I typed "c" in DDB, in the assumption
>> it would proceed with a coredump, but it didn't.  That's why I
>> hit BREAK again, and forced a dump through the "panic" DDB
>> command.  Now, I'm no longer sure whether the frames above do
>> really relate to the mentioned panic string.
> 
> Just an informational note about inducing a panic: I tend to, once at
> the db> prompt, do "bt" then immediately "call doadump".  That induces
> memory being written to swap, then do "reboot".  I assume (since you
> have a crash at all) that you have dumpdev defined in /etc/rc.conf.
> savecore(8) will then pick up the panic, etc... you get the idea.
> 
> The panic in question is intentional from what I can tell in the code.
> I'm not sure how much a kernel crash/dump is going to help with this,
> given the following code in src/sys/geom/geom_io.c:
> 
> 391 void
> 392 g_io_request(struct bio *bp, struct g_consumer *cp)
> 393 {
> ...
> 426         if (bp->bio_cmd & (BIO_READ|BIO_WRITE|BIO_DELETE)) {
> 427                 KASSERT(bp->bio_offset % cp->provider->sectorsize == 0,
> 428                     ("wrong offset %jd for sectorsize %u",
> 429                     bp->bio_offset, cp->provider->sectorsize));
> 
> phk@ added this code 6 years ago to HEAD (at the time); see the
> annotation around line 426:
> 
> http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/geom/geom_io.c#rev1.59
> 
> The assertion failed because sectorsize was not a multiple of
> bio_offset.  Specifically: 4096 / 2352 = 1.741, which isn't zero,
> therefore the panic occurs.  (It's important to read assertions
> "backwards"; that is, the assertion/panic happens when the conditional
> proves false).
> 
> I know little to nothing about CD ripping so I can't tell you why abcde
> was able to somehow trigger this.  Possibly some device read routines
> that abcde uses get translated directly into GEOM requests and therefore
> indirectly trigger the assertion?

CDDA blocks have 588 samples * 2 channels * 16 bit, 1/75th of a second
sampled at 44100 Hz, so that's the high-level view where the block size
of 2352 comes from. IMNSHO the kernel should return EINVAL or EIO or
similar, not panic.

Where the 4096 offset comes from on a 2352 block size is the other
question. If something's trying block 3, it should seek to 4704; OTOH if
anywhere block sizes get out of synch, that might explain it.  Perhaps
truss can help if its output doesn't get scratched in the panic.

The interesting question is, which devices are affected, does this
happen with acd (ATAPI) or with cd (SCSI/CAM)?

Not that I could help with that though, just a hint to Jörg for
sidestepping the problem: if it was ATAPI, try loading atapicam via
/etc/loader.conf and retry abcde on cd. ISTR that it worked for me on an
internal SATA or PATA CD-ROM.  Feel free to ping me off-list if you want
me to try if narrowing down 8-STABLE drivers is any help for the kernel
hackers.

-- 
Matthias Andree