svn commit: r278431 - head/sys/contrib/vchiq/interface/vchiq_arm

Mon Feb 9 10:09:46 UTC 2015

On Mon, Feb 09, 2015 at 08:51:02PM +1100, Bruce Evans wrote:
> On Mon, 9 Feb 2015, Konstantin Belousov wrote:
> 
> > On Mon, Feb 09, 2015 at 05:00:49PM +1100, Bruce Evans wrote:
> >> On Mon, 9 Feb 2015, Oleksandr Tymoshenko wrote:
> >> ...
> >> I think the full bugs only occur when arch has strict alignment
> >> requirements and the alignment of the __packed objects is not known.
> >> This means that only lesser bugs occur on x86 (unless you enable
> >> alignment checking, but this arguably breaks the ABI).  The compiler
> >> just generates possibly-misaligned full-width accesses if the arch
> >> doesn't have strict alignment requirements.  Often the acceses turn
> >> out to be aligned at runtime.  Otherwise, the hardware does them
> >> atomically, with a smaller efficiency penalty than split accesses.
> >
> > On x86 unaligned access is non-atomic.  This was very visible on
> > Core2 CPUs where DPCPU code mishandled the alignment, resulting in
> > the mutexes from the per-cpu areas breaking badly.
> >
> > Modern CPUs should not lock several cache lines simultaneously either.
> 
> Interesting.  I thought that this was relatively easy to handle in
> hardware and required for compatibility, so hardware did it.
Trying to lock to cache lines easily results in deadlock.
FWIW, multi-socket Intel platforms are already deadlock-prone due
to the cache, and have some facilities to debug this.

> 
> This gives a reason other than efficiency to enable alignment checking
> so as to find all places that do misaligned accesses.  I last tried this
> more than 20 years ago.  Compilers mostly generated aligned accesses.
> One exception was for copying small (sub)structs.  Inlining of the copy
> assumed maximal alignment or no alignment traps.  Library functions are
> more of a problem.  FreeBSD amd64 and i386 memcpy also assume this.
> Similarly for the MD mem* in the kernel.  Mostly things are suitably
> aligned, so it is the correct optimization to not do extra work to align.

I also did experiments with preloadable dso which sets EFLAGS.AC bit.
Last time I tried, it broke in the very early libc initialization code,
due to unaligned access generated by compiler, as you described. This
was with in-tree gcc. Tried with the clang-compiled world, I got SIGBUS
due to unaligned access in ld-elf.so.1.

AC does not work in ring 0, and Intel re-purposed the bit for kernel
recently for 'security' theater.