svn commit: r305814 - head/sys/boot/common
Emmanuel Vadot
manu at bidouilliste.com
Wed Sep 14 20:10:01 UTC 2016
Hi Bruce,
On Thu, 15 Sep 2016 05:37:19 +1000 (EST)
Bruce Evans <brde at optusnet.com.au> wrote:
> On Wed, 14 Sep 2016, Emmanuel Vadot wrote:
>
> > Log:
> > ufsread: Do not cast struct direct from void *
> > This cause alignment problem on ARM (and possibly other archs), instead copy it.
> >
> > MFC after: 1 week
> >
> > Modified:
> > head/sys/boot/common/ufsread.c
> >
> > Modified: head/sys/boot/common/ufsread.c
> > ==============================================================================
> > --- head/sys/boot/common/ufsread.c Wed Sep 14 16:47:17 2016 (r305813)
> > +++ head/sys/boot/common/ufsread.c Wed Sep 14 17:43:32 2016 (r305814)
> > @@ -97,21 +97,21 @@ static __inline uint8_t
> > fsfind(const char *name, ufs_ino_t * ino)
> > {
> > static char buf[DEV_BSIZE];
> > - struct direct *d;
> > + static struct direct d;
> > char *s;
> > ssize_t n;
>
> This looks like a good pessimization for space. boot2 on i386 has to
> fit in 8192 bytes and has a negative number to spare (features are
> already left out).
Do you have any suggestion on making the code better ?
This was the last patch for having EFI working on ARMv6 and this is
something that I want to be enabled by default at some point.
>
> With auto buffers, and also builtin memcpy, the compiler can optimize
> the memcpy to nothing on arches with no alignment, and should do this
> with -Os. I think -ffreestanding in boot programs kills the builtin,
> and this is almost intentional since compilers have poor support for
> -Os so the inline memcpy is sometimes larger unless it is null.
>
> >
> > fs_off = 0;
> > while ((n = fsread(*ino, buf, DEV_BSIZE)) > 0)
> > for (s = buf; s < buf + DEV_BSIZE;) {
> > - d = (void *)s;
> > + memcpy(&d, s, sizeof(struct direct));
> > if (ls)
> > - printf("%s ", d->d_name);
> > - else if (!strcmp(name, d->d_name)) {
> > - *ino = d->d_ino;
> > - return d->d_type;
> > + printf("%s ", d.d_name);
> > + else if (!strcmp(name, d.d_name)) {
> > + *ino = d.d_ino;
> > + return d.d_type;
> > }
> > - s += d->d_reclen;
> > + s += d.d_reclen;
> > }
> > if (n != -1 && ls)
> > printf("\n");
>
> The static buffer in the old code also looks like a pessimization for
> space. It doesn't seem to be needed to preserver or return results.
> Compilers don't seem to be smart enough to see this and optimize it
> to auto or vice versa.
>
> Testing shows that the static buffer is space optimization for gcc
> and a space pessimization for clang:
>
> X #include <string.h>
> X #include <unistd.h>
> X
> X int
> X foo(void)
> X {
> X static char buf[101];
> X static int buf1[3];
> X
> X read(0, buf, sizeof(buf));
> X memcpy(buf1, &buf[2], sizeof(buf1));
> X return (buf[0] + buf1[0]);
> X }
>
> 2 static buffers give the best pessimizations for clang -Os -m32.
> Even with -Os, clang prefers to use large instructions to reduce the
> number of instructions. The static buffers give largest instructions,
> and also prevent optimizing away the memcpy(). With auto buf1, it
> takes -ffreestanding to kill the optimization of memcpy() and with
> that optimization the size is reduced by almost a factor of 2. With
> auto buf too, other pessimizations are applied to give a size reduction
> of just 1 byte. gcc-4.2.1 never inlines the memcpy(), but it does
> the other pessimizations less well to give an in-between size.
>
> Aproximate sizes on i386: 65 bytes for clang with statics, 49 for gcc,
> 33 for clang with autos and builtins, 28 for tweaked output for clang
> with static buf, auto buf1 and builtins (null memcpy).
>
> Probably the static buf is an intentional optimization for gcc.
> Compilers do too good a job of generating large code for large stack
> offsets. In my test code, the offsets are about 101 which is less than
> 128 so it takes only 2 bytes. DEV_BSIZE is 512 so it needs 5-byte
> offsets from the usual pessimizations for the base pointer (these
> are to point %ebp at the top of the variables and put all the scalar
> variables below the buffer so that the offsets are largest). But
> clang also does good pessimizations for offsets from the static buffer:
>
> pushl $foo.buf # 5 bytes
> movsbl foo.buf, %eax # 7 bytes
> addl foo.buf+2, %eax # 6 bytes
>
> Actually optimizing for space:
>
> movl $foo.buf,%ebx # 5 bytes
> pushl %ebx # 1 byte
> movsbl (%ebx), %eax # 3 bytes
> addl 2(%ebx), %eax # 3 bytes
>
> Bruce
Cheers,
--
Emmanuel Vadot
More information about the svn-src-all
mailing list