Multiple virtual mappings considered harmful on ARM

Grzegorz Bernacki gjb at
Fri Dec 19 06:30:09 PST 2008


I've investigated lately problem with data corruption when copying big files
on ARM machine. Below are my findings.

1. High level scenario.
Problem occurs during copying of big files (~300MB and more). Calculated MD5
checksums for original and copied files are different. Chunks of data which
get corrupted have always 32 bytes in length i.e. cache line length.

2. Root cause.
The root cause of the problem is additional virtual mapping of read/write
buffers at cluster read/write (sys/kern/vfs_cluster.c, cluster_rbuild(),
cluster_wbuild(). Buffers for sequential read/write operation are concatenated
and sent to device as one big buffer. Concatenation of buffers uses
pmap_qenter(), which puts *additional* mapping in the KVA for physical area
already mapped. For each buffer we extract pages it contains and then all the
pages from all the buffers are mapped into new virtual address of new buffer.
So we end up with at least two virtual addresses for each page.

Such scenario on systems with virtual cache (most ARMs) leads to
serious problems: we can have unflushed modified data pertaining to the same
physical pages cached in separate cache blocks: data written back first
(associated with virtual mapping #1) is potentially overwritten by data
associated with virtual mapping #2 when its cache content is written back
later, or vice versa.

3. Workaround for FFS read/write problems - avoid clustered reading/writing on

# mount -o noclusterr -o noclusterw /dev/da0a /mnt/

4. More general solution.
This is the second time we indentified a problem of the same nature related to
multiple virtual mapping on ARM, and are wondering about some more general
solution that would prevent us from such problems (very subtle and hard to
nail down) in the future. We were thinking at least about an extension to
DIAGNOSTIC that would detect such attempts or so. Any other suggestions or
comments welcome.

