read(2) into some addresses doesn't return data on RPi
Ian Lepore
ian at freebsd.org
Mon Jan 12 02:49:03 UTC 2015
On Sat, 2015-01-10 at 17:04 +1100, Peter Jeremy wrote:
> Trying to access the boot partition using mtools consistently fails on my
> RPi because the kernel is returning NULs for the first sector. The second
> sector is correct. If I use dd(2) then the expected data is returned.
>
> This is running 11-current r276818 (but ISTR seeing it on older kernels).
>
> I did some digging and found that read(2)s of the SD card device return
> successful but do not actually write anything to the buffer for some
> addresses (and they happen to contain all NULs in mtools). This doesn't
> appear to affect reads of normal files.
>
> Running the attached program on /dev/mmcsd0s1 gave me the following results:
> - There are no partial reads. Either all 512 bytes are updated or none are.
> - There are two blocks of addresses 0xbfff0e00 thru 0xbfff0e00 and 0xbfff2e00
> thru 0xbfff2e00 where reads work on a 32-byte alignment but not otherwise.
> - Reads consistently fail between 0xbfff1e08 and 0xbfff1ff8
> - Reads consistently fail between 0xbfff3e08 and 0xbfff3f?? (I got a hang).
> - The program never completes. In 3 runs, I've gotten:
> - panic: null_fetch_syscall_args
> - kernel hang
> - panic: malloc: bad malloc type magic
> I don't have a serial console and so can't debug kernel panics.
>
> Putting that together, it seems to related to accesses that aren't cache-line
> aligned and cross page boundaries but I'm not sure why it behaves differently
> at different page boundaries. The hangs/panics suggest that it's writing to
> random other kernel addresses instead.
>
> Does this ring a bell for anyone?
>
This turned out to be two problems, both fixed now as of r277038.
The first problem was that the driver wasn't able to handle a dma that
was split across two physically discontiguous pages, and when an IO
isn't aligned to a cacheline the arm busdma logic that auto-bounces it
inherently ends up setting up a split buffer. Since the dma tag
required a single buffer, the mapping operation would fail with EFBIG.
The second problem was that the rpi sdhci driver was completely ignoring
the status of the busdma mapping calls, so after a failed mapping it
would do the dma anyway, using who-knows-what for a dma address, leading
to later panics or crashes due to corrupted memory.
So first I made it handle errors better, then I made it able to handle
an IO that crosses page boundaries.
I couldn't have done any of it without that program that recreated the
failure and confirmed the fix, thanks Peter!
-- Ian
More information about the freebsd-arm
mailing list