svn commit: r310013 - head/sys/dev/xen/blkfront

Wed Dec 14 16:47:14 UTC 2016

On Tue, Dec 13, 2016 at 09:28:21PM +0100, Dimitry Andric wrote:
> On 13 Dec 2016, at 07:54, Colin Percival <cperciva at FreeBSD.org> wrote:
> > 
> > Author: cperciva
> > Date: Tue Dec 13 06:54:13 2016
> > New Revision: 310013
> > URL: https://svnweb.freebsd.org/changeset/base/310013
> > 
> > Log:
> >  Check that blkfront devices have a non-zero number of sectors and a
> >  non-zero sector size.  Such a device would be a virtual disk of zero
> >  bytes; clearly not useful, and not something we should try to attach.
> > 
> >  As a fortuitous side effect, checking that these values are non-zero
> >  here results in them not *becoming* zero later on the function.  This
> >  odd behaviour began with r309124 (clang 3.9.0) but is challenging to
> >  debug; making any changes to this function whatsoever seems to affect
> >  the llvm optimizer behaviour enough to make the unexpected zeroing of
> >  the sector_size variable cease.
> 
> I've been having some fun debugging a kernel under Xen, and what I found
> is that sector_size changes from 512 to 0 because of an xs_gather() call
> in xbd_connect().  The function starts as follows:
> 
>   1222  static void
>   1223  xbd_connect(struct xbd_softc *sc)
>   1224  {
>   1225          device_t dev = sc->xbd_dev;
>   1226          unsigned long sectors, sector_size, phys_sector_size;
>   1227          unsigned int binfo;
>   1228          int err, feature_barrier, feature_flush;
> ...
>   1237          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
>   1238              "sectors", "%lu", &sectors,
>   1239              "info", "%u", &binfo,
>   1240              "sector-size", "%lu", &sector_size,
>   1241              NULL);
> 
> After this first call to xs_gather(), sectors is 44040322 (in my case of
> a ~21G disk), binfo is zero, and sector_size is 512.  During execution
> of the following few lines, sector_size stays 512, until just after the
> fourth call to xs_gather():
> 
>   1259          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
>   1260               "feature-flush-cache", "%lu", &feature_flush,
>   1261               NULL);
> 
> and then it becomes zero!  This is because the %lu format scans a 64 bit
> unsigned integer, while the feature_flush variable is a 32 bit signed
> integer, thus overwriting an adjacent location on the stack, which
> happens to contain sector_size: in the assembly output, sector_size is
> at -88(%rbp), while feature_flush is at -92(%rbp).
> 
> There is another instance of such an incorrect format, a few lines
> before this, but it doesn't seem to scribble over important data (or we
> didn't panic because of it, yet):
> 
>   1253          err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
>   1254               "feature-barrier", "%lu", &feature_barrier,
>   1255               NULL);
> 
> Here, feature_barrier is a 32 bit signed integer, while %lu again scans
> a 64 bit unsigned integer.
> 
> In short, I think the following change would also fix the problem:
> 
> Index: sys/dev/xen/blkfront/blkfront.c
> ===================================================================
> --- sys/dev/xen/blkfront/blkfront.c     (revision 309817)
> +++ sys/dev/xen/blkfront/blkfront.c     (working copy)
> @@ -1251,13 +1251,13 @@ xbd_connect(struct xbd_softc *sc)
>         if (err || phys_sector_size <= sector_size)
>                 phys_sector_size = 0;
>         err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
> -            "feature-barrier", "%lu", &feature_barrier,
> +            "feature-barrier", "%d", &feature_barrier,
>              NULL);
>         if (err == 0 && feature_barrier != 0)
>                 sc->xbd_flags |= XBDF_BARRIER;
> 
>         err = xs_gather(XST_NIL, xenbus_get_otherend_path(dev),
> -            "feature-flush-cache", "%lu", &feature_flush,
> +            "feature-flush-cache", "%d", &feature_flush,
>              NULL);
>         if (err == 0 && feature_flush != 0)
>                 sc->xbd_flags |= XBDF_FLUSH;
> 
> However, I have some difficulty getting a custom-built kernel booting on
> my very rudimentary Xen setup.  I am using a snapshot raw image, with no
> idea how to insert a kernel into it.
> 
> E.g, can you please try this out, instead of the zero-check fix?  I am
> very curious whether that fixes the panics. :)

Yes, this indeed fixes the panic, I've reverted Colin's patch and applied yours, 
and everything seems fine. Please go ahead and commit it.

xs_gather should really go away and we should use something that we can sanitize 
at compile time using __scanflike & friends.

Thanks, Roger.