kern/154299: [arcmsr] arcmsr fails to detect all attached drives

Thu Jan 24 02:00:01 UTC 2013

The following reply was made to PR kern/154299; it has been noted by GNATS.

From: Joshua Sirrine <jsirrine at gmail.com>
To: bug-followup at FreeBSD.org, Rincebrain at gmail.com
Cc:  
Subject: Re: kern/154299: [arcmsr] arcmsr fails to detect all attached drives
Date: Wed, 23 Jan 2013 19:59:23 -0600

 This is a multi-part message in MIME format.
 --------------090005070002040708070202
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed
 Content-Transfer-Encoding: 7bit

 First I'd like to appologize right now if I am sending this email and it 
 is not being routed correctly.  This is not the same as the ticket 
 system FreeNAS uses so I'm in new territory.  I've been using 
 FreeNAS(FreeBSD) for about a year but I am a quick learner.  If I need 
 to provide this information in a form other than email to fix this issue 
 please let me know.

 I believe I have found the cause for disks not being usable as seen on 
 kern/154299 <http://www.freebsd.org/cgi/query-pr.cgi?pr=154299>. Here's 
 what I see on my system.  My system uses an Areca 1280ML-24 with 
 Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on FreeBSD 8.3) 
 with areca-cli version Version 1.84, Arclib: 300, Date: Nov 9 2010( 
 FreeBSD ).  I found this issue when swapping out backplanes for my hard 
 drives.

 I had drives populating RAID controller ports 1 through 14.  Due to a 
 failed backplane I switched the 2 drives that were connected to ports 13 
 and 14 to 21 and 22 respectively.  All of these disks are in a ZFS 
 RAIDZ3 zpool.  Note that I have not had any problems with ZFS scrubs or 
 SMART long tests on these drives and they have been running for more 
 than a year so infant mortality is not an issue. Also the RAID 
 controller is in Non-RAID mode so all disks are JBOD by default.

 Physical Drive Information
    # Ch# ModelName                       Capacity  Usage
 ===============================================================================
    1  1  WDC WD20EARS-00S8B1             2000.4GB  JBOD
    2  2  WDC WD20EARS-00S8B1             2000.4GB  JBOD
    3  3  WDC WD20EARS-00S8B1             2000.4GB  JBOD
    4  4  WDC WD20EARS-00S8B1             2000.4GB  JBOD
    5  5  WDC WD20EARS-00S8B1             2000.4GB  JBOD
    6  6  WDC WD20EARS-00S8B1             2000.4GB  JBOD
    7  7  WDC WD20EARS-00S8B1             2000.4GB  JBOD
    8  8  WDC WD20EARS-00S8B1             2000.4GB  JBOD
    9  9  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   10 10  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   11 11  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   12 12  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   13 13  N.A.                               0.0GB  N.A.
   14 14  N.A.                               0.0GB  N.A.
   15 15  N.A.                               0.0GB  N.A.
   16 16  N.A.                               0.0GB  N.A.
   17 17  N.A.                               0.0GB  N.A.
   18 18  N.A.                               0.0GB  N.A.
   19 19  N.A.                               0.0GB  N.A.
   20 20  N.A.                               0.0GB  N.A.
   21 21  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   22 22  WDC WD20EARS-00S8B1             2000.4GB  JBOD
   23 23  N.A.                               0.0GB  N.A.
   24 24  N.A.                               0.0GB  N.A.
 ===============================================================================

 With this configuration disks 21 and 22 were not available to me(only 12 
 of the disks were available).  I was using a ZFS RAIDZ3 for all of these 
 disks so I immediately lost 2 disks worth of redundancy.  The disks 
 showed up in the RAID controller BIOS as well as the areca-cli(as you 
 can see) but /dev was minus 2 disks and a 'zpool status' showed I had 2 
 missing drives.  As soon as I swapped cables so that the disks were back 
 in ports 13 and 14 on the RAID controller everything went back to normal.

 Knowing that something was wrong I grabbed some spare drives and started 
 experimenting.  I wanted to know what was actually wrong because I am 
 trusting this sytem with my data for production use. Please examine the 
 following VolumeSet Information:

 VolumeSet Information
    # Name             Raid Name       Level   Capacity Ch/Id/Lun State
 ===============================================================================
    1 WD20EARS-00S8B1  Raid Set # 00   JBOD    2000.4GB 00/00/00 Normal
    2 WD20EARS-00S8B1  Raid Set # 01   JBOD    2000.4GB 00/00/01 Normal
    3 WD20EARS-00S8B1  Raid Set # 02   JBOD    2000.4GB 00/00/02 Normal
    4 WD20EARS-00S8B1  Raid Set # 03   JBOD    2000.4GB 00/00/03 Normal
    5 WD20EARS-00S8B1  Raid Set # 04   JBOD    2000.4GB 00/00/04 Normal
    6 WD20EARS-00S8B1  Raid Set # 05   JBOD    2000.4GB 00/00/05 Normal
    7 WD20EARS-00S8B1  Raid Set # 06   JBOD    2000.4GB 00/00/06 Normal
    8 WD20EARS-00S8B1  Raid Set # 07   JBOD    2000.4GB 00/00/07 Normal
    9 WD20EARS-00S8B1  Raid Set # 08   JBOD    2000.4GB 00/01/00 Normal
   10 WD20EARS-00S8B1  Raid Set # 09   JBOD    2000.4GB 00/01/01 Normal
   11 WD20EARS-00S8B1  Raid Set # 10   JBOD    2000.4GB 00/01/02 Normal
   12 WD20EARS-00S8B1  Raid Set # 11   JBOD    2000.4GB 00/01/03 Normal
   13 WD20EARS-00S8B1  Raid Set # 12   JBOD    2000.4GB 00/01/04 Normal
   14 WD20EARS-00S8B1  Raid Set # 13   JBOD    2000.4GB 00/01/05 Normal
 ===============================================================================
 GuiErrMsg<0x00>: Success.

 This is my normal configuration and all disks work.  After experimenting 
 it turns out that if I want to use ports 1 through 8 I MUST have a disk 
 in port 1.  For ports 9 through 16 I MUST have a disk in port 9.  For 
 ports in 17-24 I MUST have a disk in port 17. It appears there may be 
 something special to CH/ID/LUN=XX/XX/00.  If there is no disk at LUN=00 
 then that entire ID is not available for use by FreeBSD despite the 
 areca-cli properly identifying the disk.

 If you look at the kern/143299:

         *arcmsr fails to detect all attached drives. It may or may not
         have something to do with a failed device attached and e.g. PR
         148502 or 150390.*

         *c.f.:*

         *[root at manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;*
         *# Ch# ModelName Capacity Usage*
         *===============================================================================*
         *1 1 N.A. 0.0GB N.A.*
         *2 2 N.A. 0.0GB N.A.*
         *3 3 N.A. 0.0GB N.A.*
         *4 4 N.A. 0.0GB N.A.*
         *5 5 N.A. 0.0GB N.A.*
         *6 6 N.A. 0.0GB N.A.*
         *7 7 N.A. 0.0GB N.A.*
         *8 8 N.A. 0.0GB N.A.*
         *9 9 ST31500341AS 1500.3GB JBOD*
         *10 10 N.A. 0.0GB N.A.*
         *11 11 ST31500341AS 1500.3GB JBOD*
         *12 12 ST31500341AS 1500.3GB JBOD*
         *13 13 ST31500341AS 1500.3GB JBOD*
         *14 14 N.A. 0.0GB N.A.*
         *15 15 ST31500341AS 1500.3GB JBOD*
         *16 16 ST31500341AS 1500.3GB JBOD*
         *17 17 N.A. 0.0GB N.A.*
         *18 18 N.A. 0.0GB N.A.*
         *19 19 ST31500341AS 1500.3GB JBOD*
         *20 20 ST31500341AS 1500.3GB JBOD*
         *21 21 ST31500341AS 1500.3GB JBOD*
         *22 22 0.0GB Failed*
         *23 23 ST31500341AS 1500.3GB JBOD*
         *24 24 ST31500341AS 1500.3GB JBOD*
         *===============================================================================*
         *GuiErrMsg<0x00>: Success.*
         */dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d
         /dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3
         /dev/da4 /dev/da5*

         *I count 11 drives attached via the arc1280ml, not including the
         failed drive, and I see 6 appearing.*

         *camcontrol rescan all and reboots to do help the issue. I am
         running firmware 1.49.*

 If you take what I observed and apply it to his post you will see that 
 only disks 9, 11, 12, 13, 15, and 16 would be available to the system.  
 So this is inline with the poster that says he has only 6 disk 
 available.  I am writing this email in hopes that someone can find and 
 fix the issue.  I do not have any failed disks to experiment with, but I 
 am convinced based on 4 hours of experimenting last night that the issue 
 may only involve failed disks if a disk fails in ports 1, 9 or 17.

 --------------090005070002040708070202
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: 7bit

 <html>
   <head>

     <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
   </head>
   <body text="#000000" bgcolor="#FFFFFF">
     First I'd like to appologize right now if I am sending this email
     and it is not being routed correctly.  This is not the same as the
     ticket system FreeNAS uses so I'm in new territory.  I've been using
     FreeNAS(FreeBSD) for about a year but I am a quick learner.  If I
     need to provide this information in a form other than email to fix
     this issue please let me know.<br>
     <br>
     I believe I have found the cause for disks not being usable as seen
     on <a href="http://www.freebsd.org/cgi/query-pr.cgi?pr=154299">kern/154299</a>. 
     Here's what I see on my system.  My system uses an Areca 1280ML-24
     with Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on
     FreeBSD 8.3) with areca-cli version Version 1.84, Arclib: 300, Date:
     Nov 9 2010( FreeBSD ).  I found this issue when swapping out
     backplanes for my hard drives.<br>
     <br>
     I had drives populating RAID controller ports 1 through 14.  Due to
     a failed backplane I switched the 2 drives that were connected to
     ports 13 and 14 to 21 and 22 respectively.  All of these disks are
     in a ZFS RAIDZ3 zpool.  Note that I have not had any problems with
     ZFS scrubs or SMART long tests on these drives and they have been
     running for more than a year so infant mortality is not an issue. 
     Also the RAID controller is in Non-RAID mode so all disks are JBOD
     by default.<br>
     <br>
     Physical Drive Information<br>
       # Ch# ModelName                       Capacity  Usage<br>
 ===============================================================================<br>
       1  1  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
       2  2  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
       3  3  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
       4  4  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
       5  5  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
       6  6  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
       7  7  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
       8  8  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
       9  9  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
      10 10  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
      11 11  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
      12 12  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
      13 13  N.A.                               0.0GB  N.A.      <br>
      14 14  N.A.                               0.0GB  N.A.      <br>
      15 15  N.A.                               0.0GB  N.A.      <br>
      16 16  N.A.                               0.0GB  N.A.      <br>
      17 17  N.A.                               0.0GB  N.A.      <br>
      18 18  N.A.                               0.0GB  N.A.      <br>
      19 19  N.A.                               0.0GB  N.A.      <br>
      20 20  N.A.                               0.0GB  N.A.      <br>
      21 21  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
      22 22  WDC WD20EARS-00S8B1             2000.4GB  JBOD      <br>
      23 23  N.A.                               0.0GB  N.A.      <br>
      24 24  N.A.                               0.0GB  N.A.      <br>
 ===============================================================================<br>
     <br>
     With this configuration disks 21 and 22 were not available to
     me(only 12 of the disks were available).  I was using a ZFS RAIDZ3
     for all of these disks so I immediately lost 2 disks worth of
     redundancy.  The disks showed up in the RAID controller BIOS as well
     as the areca-cli(as you can see) but /dev was minus 2 disks and a
     'zpool status' showed I had 2 missing drives.  As soon as I swapped
     cables so that the disks were back in ports 13 and 14 on the RAID
     controller everything went back to normal.<br>
     <br>
     Knowing that something was wrong I grabbed some spare drives and
     started experimenting.  I wanted to know what was actually wrong
     because I am trusting this sytem with my data for production use.
     Please examine the following VolumeSet Information:<br>
     <br>
     VolumeSet Information<br>
       # Name             Raid Name       Level   Capacity Ch/Id/Lun 
     State         <br>
 ===============================================================================<br>
       1 WD20EARS-00S8B1  Raid Set # 00   JBOD    2000.4GB 00/00/00  
     Normal<br>
       2 WD20EARS-00S8B1  Raid Set # 01   JBOD    2000.4GB 00/00/01  
     Normal<br>
       3 WD20EARS-00S8B1  Raid Set # 02   JBOD    2000.4GB 00/00/02  
     Normal<br>
       4 WD20EARS-00S8B1  Raid Set # 03   JBOD    2000.4GB 00/00/03  
     Normal<br>
       5 WD20EARS-00S8B1  Raid Set # 04   JBOD    2000.4GB 00/00/04  
     Normal<br>
       6 WD20EARS-00S8B1  Raid Set # 05   JBOD    2000.4GB 00/00/05  
     Normal<br>
       7 WD20EARS-00S8B1  Raid Set # 06   JBOD    2000.4GB 00/00/06  
     Normal<br>
       8 WD20EARS-00S8B1  Raid Set # 07   JBOD    2000.4GB 00/00/07  
     Normal<br>
       9 WD20EARS-00S8B1  Raid Set # 08   JBOD    2000.4GB 00/01/00  
     Normal<br>
      10 WD20EARS-00S8B1  Raid Set # 09   JBOD    2000.4GB 00/01/01  
     Normal<br>
      11 WD20EARS-00S8B1  Raid Set # 10   JBOD    2000.4GB 00/01/02  
     Normal<br>
      12 WD20EARS-00S8B1  Raid Set # 11   JBOD    2000.4GB 00/01/03  
     Normal<br>
      13 WD20EARS-00S8B1  Raid Set # 12   JBOD    2000.4GB 00/01/04  
     Normal<br>
      14 WD20EARS-00S8B1  Raid Set # 13   JBOD    2000.4GB 00/01/05  
     Normal<br>
 ===============================================================================<br>
     GuiErrMsg<0x00>: Success.<br>
     <br>
     This is my normal configuration and all disks work.  After
     experimenting it turns out that if I want to use ports 1 through 8 I
     MUST have a disk in port 1.  For ports 9 through 16 I MUST have a
     disk in port 9.  For ports in 17-24 I MUST have a disk in port 17. 
     It appears there may be something special to CH/ID/LUN=XX/XX/00.  If
     there is no disk at LUN=00 then that entire ID is not available for
     use by FreeBSD despite the areca-cli properly identifying the disk.<br>
     <br>
     If you look at the kern/143299:<br>
     <br>
     <blockquote>
       <blockquote><b>arcmsr fails to detect all attached drives. It may
           or may not have something to do with a failed device attached
           and e.g. PR 148502 or 150390.</b><br>
         <br>
         <b>c.f.:</b><br>
         <br>
         <b>[root at manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;</b><br>
         <b> # Ch# ModelName Capacity Usage</b><br>
         <b>===============================================================================</b><br>
         <b> 1 1 N.A. 0.0GB N.A.</b><br>
         <b> 2 2 N.A. 0.0GB N.A.</b><br>
         <b> 3 3 N.A. 0.0GB N.A.</b><br>
         <b> 4 4 N.A. 0.0GB N.A.</b><br>
         <b> 5 5 N.A. 0.0GB N.A.</b><br>
         <b> 6 6 N.A. 0.0GB N.A.</b><br>
         <b> 7 7 N.A. 0.0GB N.A.</b><br>
         <b> 8 8 N.A. 0.0GB N.A.</b><br>
         <b> 9 9 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 10 10 N.A. 0.0GB N.A.</b><br>
         <b> 11 11 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 12 12 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 13 13 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 14 14 N.A. 0.0GB N.A.</b><br>
         <b> 15 15 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 16 16 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 17 17 N.A. 0.0GB N.A.</b><br>
         <b> 18 18 N.A. 0.0GB N.A.</b><br>
         <b> 19 19 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 20 20 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 21 21 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 22 22 0.0GB Failed</b><br>
         <b> 23 23 ST31500341AS 1500.3GB JBOD</b><br>
         <b> 24 24 ST31500341AS 1500.3GB JBOD</b><br>
         <b>===============================================================================</b><br>
         <b>GuiErrMsg<0x00>: Success.</b><br>
         <b>/dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d
           /dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3
           /dev/da4 /dev/da5</b><br>
         <br>
         <b>I count 11 drives attached via the arc1280ml, not including
           the failed drive, and I see 6 appearing.</b><br>
         <br>
         <b>camcontrol rescan all and reboots to do help the issue. I am
           running firmware 1.49.</b><br>
       </blockquote>
     </blockquote>
     <br>
     If you take what I observed and apply it to his post you will see
     that only disks 9, 11, 12, 13, 15, and 16 would be available to the
     system.  So this is inline with the poster that says he has only 6
     disk available.  I am writing this email in hopes that someone can
     find and fix the issue.  I do not have any failed disks to
     experiment with, but I am convinced based on 4 hours of
     experimenting last night that the issue may only involve failed
     disks if a disk fails in ports 1, 9 or 17.<br>
     <br>
     <br>
   </body>
 </html>

 --------------090005070002040708070202--