[SOLVED] Re: "zpool attach" problem

Sat Nov 21 22:33:39 UTC 2020

Hi David,
     Thanks for your reply.  I was about to respond to my own message to say that the
issue has been resolved, but I saw your reply first.  However, I respond below to
your comments and questions, as well as stating what the problem turned out to be.

     On Fri, 20 Nov 2020 21:16:06 -0800 David Christensen <dpchrist at holgerdanske.com>
wrote:

>On 2020-11-19 22:59, Scott Bennett via freebsd-questions wrote:
>>       I had a pool with two two-way mirrors as the top-level vdevs.  I needed
>> to shift some of those partitions by a short distance on the drives, so I
>> detached and deleted and rebuilt them one at a time until I hit a snag.  Here
>> is the situation.
>> 
>> Script started on Fri Nov 20 00:40:36 2020
>> hellas#	gpart show -l ada2 da0 da1 da2
>> =>        40  5860533088  ada2  GPT  (2.7T)
>>            40  4294967296     1  WD-WMC130F2V1RN  (2.0T)
>>    4294967336    31457496        - free -  (15G)
>>    4326424832   125829120    11  zmisc mirror-0 1  (60G)
>>    4452253952   209715200    15  bw2-0  (100G)
>>    4661969152  1198563976        - free -  (572G)
>> 
>> =>        34  3907029101  da0  GPT  (1.8T)
>>            34          14       - free -  (7.0K)
>>            48  3749709824    1  WD  WCC4MH1P7LYS  (1.7T)
>>    3749709872    73400320    5  bw1-0  (35G)
>>    3823110192        2000       - free -  (1.0M)
>>    3823112192    83886080    8  zmisc mirror-1 1  (40G)
>>    3906998272       30863       - free -  (15M)
>> 
>> =>        34  3907029100  da1  GPT  (1.8T)
>>            34          14       - free -  (7.0K)
>>            48  3749709824    1  Seagate NA5KYLVM  (1.7T)
>>    3749709872          16       - free -  (8.0K)
>>    3749709888    73400320    5  bw1-1  (35G)
>>    3823110208        1984       - free -  (992K)
>>    3823112192    83886080    8  zmisc mirror-1 0  (40G)
>>    3906998272       30862       - free -  (15M)
>> 
>> =>        40  3907029088  da2  GPT  (1.8T)
>>            40           8       - free -  (4.0K)
>>            48  3749709824    1  WD-WCC6N7KD2YAK  (1.7T)
>>    3749709872          16       - free -  (8.0K)
>>    3749709888    31457280    5  bw0-0  (15G)
>>    3781167168        1984       - free -  (992K)
>>    3781169152   125829120    8  zmisc mirror-0 0  (60G)
>>    3906998272       30856       - free -  (15M)
>> 
>> hellas#	zpool status zmisc
>>    pool: zmisc
>>   state: ONLINE
>>    scan: resilvered 25.8G in 0 days 00:16:07 with 0 errors on Fri Nov 20 00:10:19 2020
>> config:
>> 
>> 	NAME        STATE     READ WRITE CKSUM
>> 	zmisc       ONLINE       0     0     0
>> 	  ada2p11   ONLINE       0     0     0
>> 	  mirror-1  ONLINE       0     0     0
>> 	    da0p8   ONLINE       0     0     0
>> 	    da1p8   ONLINE       0     0     0
>> 
>> errors: No known data errors
>> hellas#	zpool attach zmisc ada2p11 da2p8
>> cannot attach da2p8 to ada2p11: no such pool or dataset
>> hellas#	exit
>> exit
>> 
>> Script done on Fri Nov 20 00:42:33 2020
>> 
>>       Would somebody please tell me what I am doing wrong here?  Many thanks in
>> advance to whoever can help.
>
>It looks like you added the slice ada2p11 to zmisc, rather than the 
>mirror ada2p11 da2p8.  If so, these commands could fix things:
>
     No, ada2p11 is what was left after detaching a partition from
mirror-0 of that pool.
>
>     # zpool remove zmisc ada2p11
>
>     # zpool add zmisc mirror ada2p11 da2p8
>
     I thought about doing that, but the allocated portion of mirror-0
was too much to fit into the free space in mirror-1.  Also, even if
mirror-1 could have held that much, that kind of monkeying around ends
up creating a situation of horribly unbalanced allocation, and so I
would have hesitated at least a day or three to see if I could find a
better way, and it was a good thing that I stopped and went to bed when
I was done posting my message.  (See confession further below.)
>
>But, I am confused by your storage architecture.  Why one internal "3 
>TB" drive and three external "2 TB" drives?  What is the 2.0T internal 
>slice for?  What are the three 1.7 GiB external slices for?  What are 

     Long story.  Sigh.  About eight or nine years ago I began using
ZFS under 9.something i386.  (Currently the machine is running
11.4-STABLE amd64.)  At first it was all experimental while I learned
enough to begin to feel some confidence in using it.  Once I had purchased
six 1.8 TB drives, I created my largest pool called rz7A and quickly moved
my backups and archives into it, and AFAIK I have not lost a single byte
due to hardware errors, power failures, or anything else since then.  (I
likely need to think up a better name for it, but that is way down on
the list of my worries for now.)  It comprised six 1.7 TB partitions on
the six 1.8 TB (actually closer to 1.9 TB, but FreeBSD truncates, rather
than rounds) drives in a raidz2.  That left a bit of room for other things
I intended to do that would take up much less space.  It also meant that
those 1.7 TB partitions could be exactly the same in terms of space and
not differ among them due to slight differences in the real storage
capacities of drives of different make{,r}s and models.  In the
intervening time there have been many drive failures (mostly Seagates, but
a few aged-out WD drives, too).  About a year ago, a drive failed, and I
replaced it with a WD Black 1.8 TB drive, which continues to function
flawlessly.
     Then in January or February two drives failed in rapid succession.
At that time, I found two 2.7 TB enterprise drives as replacements, and
they were priced much lower apiece than the drive I had bought a month or
two earlier.  While allocating the partitions on them, I allocated 2 TB
on each as the replacements for the 1.7 TB partitions that were on the
failed drives.  This past summer one of the new enterprise drives failed.
It turned out that the reason they had been available so cheaply was that
they had been leftover stock of a now discontinued line, so basically
they were sold at a closeout price.  Getting a replacement for the failed
enterprise drive under warranty turned out to be a nightmare.  First,
the manufacturer said they didn't have a drive of that capacity in the
new line, and they wanted to know if I would accept a "4 TB" drive as a
replacement, which I naturally approved.  When no drive appeared after two
weeks, I called and discovered they had left the apartment number off of
the address, even though I had had the agent repeat the address back to
me on the phone.  The parcel service had returned the drive to them as
undeliverable.  The manufacturer then turned around and *gave my drive to
somebody else*, which I believe legally constitutes theft and sale of
stolen property, but I did not pursue that.  They said they would send
another, but that didn't appear either.  I called and was told that it had 
been held up until they could confirm the shipping address *again*, which
I then did.  When the 3.6 TB replacement arrived, it was *not* an enterprise
drive.  I called again and asked what was going on and was told that they
substituted a non-enterprise drive because they didn't have a "4 TB"
enterprise drive available.  I then gave them a pretty bad time about
leaving my array at risk in a degraded state for so long by their not living
up to their warranty, as well as having given a drive that belonged to me
away to somebody else.  They kept putting me off by requiring to speak with
another and yet another person in their company, usually requiring separate
phone calls on different days and shipping the non-enterprise drive back to
them, but eventually someone arranged for an enterprise drive (of their
current line of enterprise drives) to be shipped from their Canadian
inventory with an expected additional delay due to having to pass customs
and exacerbated by the COVID-19 situation.  The drive arrived after one week.
Total time until I had a replacement under warranty was nearly *two months*
on a failed *enterprise* drive.  I know I am not a high-volume customer like
Netflix or Amazon, but really(!) that seems unreasonable.
     So that is the story in a nutshell of how my ever-changing configuration
has evolved and why some of the unallocated space on the drives appears where
it does.
     As the 1.8 TB drives give up, I intend to replace them with larger-
capacity drives and expand the single top-level vdev in that pool, such
that each component will have a 2 TB capacity, rather than its current
1.7 TB capacity.  If disk capacities continue to increase with prices
decreasing fast enough compared to the remaining lifetimes of the 1.8
TB drives, I may expand the components still further.  The two enterprise
drives already have the spare space to expand their components quite
substantially more than the present 2 TB each.

>the bw?-? slices for, and why are they different sizes?  Why are the 
>zmisc slices different sizes?  What about ada0 and ada1?  And, do you 

      Name    Status  Components
mirror/bw0  COMPLETE  da3p5 (ACTIVE)
                      da2p5 (ACTIVE)
mirror/bw1  COMPLETE  da0p5 (ACTIVE)
                      da1p5 (ACTIVE)
mirror/bw2  COMPLETE  ada2p15 (ACTIVE)
                      ada3p5 (ACTIVE)
            Name  Status  Components
concat/buildwork      UP  mirror/bw1
                          mirror/bw0
                          mirror/bw2

(N.B. the components of buildwork are listed out of sequence here.  They
are configured as bw0, bw1, bw2.)

>have spaces in your GPT labels?
>
     The motherboard in the tower has six SATA ports.  Two are for optical
drives, and four are for HDDs/SSDs.  There is also an eSATA controller that
I used for one of the external drives for a while, but something failed,
and now I can't use the drive that way, so it is on a USB 3.0 port.  The
machine is very old and has no native USB 3.0 support, but I added two PCIe
cards for USB 3.0, one with four ports and one with two ports.  The external
drives are currently connected with two per controller, and the four-port
card also has a seven-port USB 3.0 external hub plugged into it that rarely
sees any use (mostly just flash drives).
     ada0 and ada1 are the much smaller boot drives and are not involved in
what happened.
    ada2 and ada3 are the two drives internal to my ancient tower that have
components of the large raidz2, and da0 through da3 contain rest of the
six components.
     The GPT label fields in the "gpart show -l" output in my earlier
message have no unprintable characters in them, so they are exactly as
shown.
     Now, on to my confession.  The problem was that I had reinserted the
wrong partition into bw0 due to a typo; i.e., I had typed da2p8 instead of
da2p5, so da2p8 was not available. :-( (It would be nice if GEOM and ZFS
error messages were more intelligibly worded, but if wishes were horses ...)
Once I saw what the problem was, it was trivially easy and quick to fix.
     Again, thank you much for your reply.  I wish I had gotten the trouble
shot sooner (sleep can only be postponed for so long) and posted a followup
sooner (ditto) in order to have saved you the bother, but it's nice to know
that someone usually does try to help when someone asks for help on these
lists.

                                  Scott Bennett, Comm. ASMELG, CFIAG
**********************************************************************
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
*--------------------------------------------------------------------*
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."                                               *
*    -- Gov. John Hancock, New York Journal, 28 January 1790         *
**********************************************************************