sparc64/165025: [PATCH] zfsboot support for sparc64
Marius Strobl
marius at alchemy.franken.de
Sun Apr 29 16:30:15 UTC 2012
The following reply was made to PR sparc64/165025; it has been noted by GNATS.
From: Marius Strobl <marius at alchemy.franken.de>
To: bug-followup at FreeBSD.org, gavin.mu at gmail.com
Cc:
Subject: Re: sparc64/165025: [PATCH] zfsboot support for sparc64
Date: Sun, 29 Apr 2012 18:10:19 +0200
On Sun, Apr 22, 2012 at 08:40:13PM +0000, Marius Strobl wrote:
> The following reply was made to PR sparc64/165025; it has been noted by GNATS.
>
> From: Marius Strobl <marius at alchemy.franken.de>
> To: Gavin Mu <gavin.mu at gmail.com>
> Cc: bug-followup at freebsd.org, Kurt Lidl <lidl at pix.net>
> Subject: Re: sparc64/165025: [PATCH] zfsboot support for sparc64
> Date: Sun, 22 Apr 2012 22:32:11 +0200
>
> On Thu, Apr 12, 2012 at 10:27:31PM +0800, Gavin Mu wrote:
> > On Mon, Mar 5, 2012 at 2:06 AM, Marius Strobl <marius at alchemy.franken.de> wrote:
> > > Typically, opening and closing devices via OFW causes quite a delay,
> > > the exact impact depends on the firmware version and the devices
> > > involved though. Therefore, it would be advisable to keep using the
> > > current approach of caching opened packages. In what way does this
> > > fail with ZFS?
> > The error message on Fire V100 is: Fast Data Access MMU Miss
> >
> > > Basically, IEEE 1275 just says that support for
> > > opening a package more than once depends on the particular package
> > > but nothing about concurrently opening different packages. Not
> > > being able to concurrently open different packages also doesn't
> > > make all that much of a sense as opening one package also means
> > > to subsequentially open all the parents up to the root if not
> > > already opened and I think to actually have tested opening disks
> > > concurrently when writing the current code. Could this fail due
> > > to one device actually being opened twice, once via the full path
> > > and once via its alias?
> > There is no such scene though the code lacks the checking for full
> > path/devalias.
> > I have tried many times to find the root cause but I think it is
> > difficult without open firmware knowledge.
> > currently I found that following scenes will cause this issue with my test code:
> > 1. do OF_seek(ihandle_t a) just after OF_close(ihandle_t b). in real
> > world, OF_seek(a) is the step to read ZFS data just after OF_close()
> > another disk during zfs init/probe.
> > 2. do OF_seek(ihandle_t a) just after OF_open("available controller
> > without disk"). For example there is no disk3 on my machine though
> > there is disk controller. OF_open("disk3:") will report:
> > Can't read disk label.
> > Can't open disk label package
> >
> > in ofw_disk.c, OF_close() has been commented out for powerpc
> > architecture, and can not find detail reason from code history, so I
> > am thinking if we need also disable OF_close() for sparc64.
>
> Hrm, some OFW implementations might have reference counting bugs,
> causing OF_close() to also close some parent(s) when these in fact
> are still used by another opened child device. Have you tried how
> it works when just commenting out the OF_close() in ofwd_close() but
> leaving the rest of ofw_disk.c as is? If that works, we probably
> can add a cleanup handler which closes all opened disk devices
> before leaving the loader, still taking advantage of caching opened
> disks.
>
With the machines I have at hand, I can't reproduce this problem,
i.e. ofw_disk.c as is works just fine for booting from a mirror.
This suggests that what you are seeing actually is a bug in the
specific firmware implementation rather than a general limitation
imposed by OFW. Could you please give the following patch a try?
It implements what I've described above, i.e. combines both caching
opened devices and properly closing all opened disks when leaving
the loader.
http://people.freebsd.org/~marius/ofw_disk_close_on_cleanup.diff
Marius
More information about the freebsd-sparc64
mailing list