svn commit: r344238 - head/stand/common

Tue Feb 19 01:20:18 UTC 2019

On Mon, Feb 18, 2019 at 3:15 PM Warner Losh <imp at bsdimp.com> wrote:

>
>
> On Mon, Feb 18, 2019 at 10:51 AM Rodney W. Grimes <
> freebsd at pdx.rh.cn85.dnsmgr.net> wrote:
>
>> > On Mon, Feb 18, 2019 at 7:31 AM Ian Lepore <ian at freebsd.org> wrote:
>> >
>> > > On Mon, 2019-02-18 at 15:09 +0200, Toomas Soome wrote:
>> > > > > On 18 Feb 2019, at 01:32, Ian Lepore <ian at FreeBSD.org> wrote:
>> > > > >
>> > > > > Author: ian
>> > > > > Date: Sun Feb 17 23:32:09 2019
>> > > > > New Revision: 344238
>> > > > > URL: https://svnweb.freebsd.org/changeset/base/344238
>> > > > >
>> > > > > Log:
>> > > > >  Restore loader(8)'s ability for lsdev to show partitions within
>> a bsd
>> > > slice.
>> > > > >
>> > > > >  I'm pretty sure this used to work at one time, perhaps long
>> ago.  It
>> > > has
>> > > > >  been failing recently because if you call disk_open() with
>> > > dev->d_partition
>> > > > >  set to -1 when d_slice refers to a bsd slice, it assumes you
>> want it
>> > > to
>> > > > >  open the first partition within that slice.  When you then pass
>> that
>> > > open
>> > > > >  dev instance to ptable_open(), it tries to read the start of the
>> 'a'
>> > > > >  partition and decides there is no recognizable partition type
>> there.
>> > > > >
>> > > > >  This restores the old functionality by resetting d_offset to the
>> start
>> > > > >  of the raw slice after disk_open() returns.  For good measure,
>> > > d_partition
>> > > > >  is also set back to -1, although that doesn't currently affect
>> > > anything.
>> > > > >
>> > > > >  I would have preferred to make disk_open() avoid such rude
>> > > assumptions and
>> > > > >  if you ask for partition -1 you get the raw slice.  But the
>> commit
>> > > history
>> > > > >  shows that someone already did that once (r239058), and had to
>> revert
>> > > it
>> > > > >  (r239232), so I didn't even try to go down that road.
>> > > >
>> > > >
>> > > > What was the reason for the revert? I still do think the current
>> > > > disk_open() approach is not good because it does break the promise
>> to
>> > > > get MBR partition (see common/disk.h).
>> > > >
>> > > > Of course I can see the appeal for something like ?boot disk0:? but
>> > > > case like that should be solved by iterating partition table, and
>> not
>> > > > by making API to do wrong thing - if I did ask to for disk0s0: I
>> > > > really would expect to get disk0s0: and not disk0s0a:
>> > > >
>> > > > But anyhow, it would be good to understand the actual reasoning
>> > > > behind that decision.
>> > > >
>> > >
>> > > I have no idea. As is so often the case, the commit message for the
>> > > revert said what the commit did ("revert to historic behavior")
>> without
>> > > any hint of why the change was made. One has to assume that it broke
>> > > some working cases and people complained.
>> > >
>> > > Part of the problem for the disk_open() "api" is that there is not
>> even
>> > > a comment block with some hints in it. I was thinking one potential
>> > > solution might instead of using "if (partition < 0)" we could define a
>> > > couple magical partition number values, PNUM_GETBEST = -1,
>> > > PNUM_RAWSLICE = -2, that sort of thing. But it would require carefully
>> > > combing through the existing code looking at all calls to disk_open()
>> > > and all usage of disk_devdesc.d_partition.
>> > >
>> >
>> > I think that we should fix the disk_open() api. And then fix everything
>> > that uses it to comply with the new rules. The current hodge-podge that
>> we
>> > have causes a number of issues for different environments that are only
>> > mostly worked around. I've done some work in this area to make things
>> more
>> > regular, but it's still a dog's breakfast of almost-stackable devices
>> that
>> > we overload too many things on, resulting in the mis-mash we have today.
>> > When the whole world was just MBR + BSD label, we could get away with
>> it.
>> > But now that we have GPT as well as GELI support and ZFS pool devices on
>> > top of disk devices, the arrangements aren't working as well as could be
>> > desired.
>>
>> Yes please.
>> Could this be some of the cause of issues with legacy mbr boots
>> that use to work, that now stop working?  I have observed this
>> on zfs in mbr on a machine here.  I did not have time to debug
>> it, it was faster to nuke and pave the machine that was critical
>> to operations, the error came as part of a failed freenas upgrade
>> to the 11.2 based version.
>>
>
> I think the MBR issues are the same ones that Patrick Kelsey is looking
> into. He has a differential review. If the upgrade was to FreeBSD 11.2,
> however, that pre-dates most of the work we've done on the loader, so the
> root cause of that issue may be different. And it may already be fixed in
> stable/11 as we've since merged everything in current into stable/11
> (except for a few things that changed the defaults in a user-visible way).
>
>
To clarify, I recently committed some boot code patches that were the
result of me investigating a problem on a machine that I have that
initially presented as 'upgraded a ZFS machine to 11.2 and it appeared to
hang during boot'.  I spent a lot of time root causing the issue I was
having on the chance that it might be a/the root cause of problems others
have encountered that have a similar summary description.  However, it
turned out the root cause of my problem was a firmware bug in a 3Ware
9650SE related to multi-sector I/Os being issued to either 'suitably large'
LBAs or perhaps LBAs 'suitably close' to the end of the disk causing 60
second I/O timeouts each, and moving to 11.2 triggered it because in 11.2,
the ZFS code was modified to look at all the ZFS labels, two of which are
very close to the end of the disk.  Thankfully my problem was fixed by
installing the latest available firmware version for that card (otherwise,
the hack needed in the loader would have been to break up multi-sector I/Os
'near' the end of the disk and issue them in descending LBA order).

The patches I committed were for issues I found during my investigation
that were unrelated to my problem, one of which was the potential for a
buffer overrun in the ZFS support code (which I think could only occur on
systems running an ashift=9 ZFS configuration on 4K drives), and the other
of which was the resolution of an inconsistency between the ZFS probing
behavior of zfsboot and loader.

-Patrick