kern/154299: [arcmsr] arcmsr fails to detect all attached drives
Joshua Sirrine
jsirrine at gmail.com
Thu Jan 24 02:00:01 UTC 2013
The following reply was made to PR kern/154299; it has been noted by GNATS.
From: Joshua Sirrine <jsirrine at gmail.com>
To: bug-followup at FreeBSD.org, Rincebrain at gmail.com
Cc:
Subject: Re: kern/154299: [arcmsr] arcmsr fails to detect all attached drives
Date: Wed, 23 Jan 2013 19:59:23 -0600
This is a multi-part message in MIME format.
--------------090005070002040708070202
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
First I'd like to appologize right now if I am sending this email and it
is not being routed correctly. This is not the same as the ticket
system FreeNAS uses so I'm in new territory. I've been using
FreeNAS(FreeBSD) for about a year but I am a quick learner. If I need
to provide this information in a form other than email to fix this issue
please let me know.
I believe I have found the cause for disks not being usable as seen on
kern/154299 <http://www.freebsd.org/cgi/query-pr.cgi?pr=154299>. Here's
what I see on my system. My system uses an Areca 1280ML-24 with
Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on FreeBSD 8.3)
with areca-cli version Version 1.84, Arclib: 300, Date: Nov 9 2010(
FreeBSD ). I found this issue when swapping out backplanes for my hard
drives.
I had drives populating RAID controller ports 1 through 14. Due to a
failed backplane I switched the 2 drives that were connected to ports 13
and 14 to 21 and 22 respectively. All of these disks are in a ZFS
RAIDZ3 zpool. Note that I have not had any problems with ZFS scrubs or
SMART long tests on these drives and they have been running for more
than a year so infant mortality is not an issue. Also the RAID
controller is in Non-RAID mode so all disks are JBOD by default.
Physical Drive Information
# Ch# ModelName Capacity Usage
===============================================================================
1 1 WDC WD20EARS-00S8B1 2000.4GB JBOD
2 2 WDC WD20EARS-00S8B1 2000.4GB JBOD
3 3 WDC WD20EARS-00S8B1 2000.4GB JBOD
4 4 WDC WD20EARS-00S8B1 2000.4GB JBOD
5 5 WDC WD20EARS-00S8B1 2000.4GB JBOD
6 6 WDC WD20EARS-00S8B1 2000.4GB JBOD
7 7 WDC WD20EARS-00S8B1 2000.4GB JBOD
8 8 WDC WD20EARS-00S8B1 2000.4GB JBOD
9 9 WDC WD20EARS-00S8B1 2000.4GB JBOD
10 10 WDC WD20EARS-00S8B1 2000.4GB JBOD
11 11 WDC WD20EARS-00S8B1 2000.4GB JBOD
12 12 WDC WD20EARS-00S8B1 2000.4GB JBOD
13 13 N.A. 0.0GB N.A.
14 14 N.A. 0.0GB N.A.
15 15 N.A. 0.0GB N.A.
16 16 N.A. 0.0GB N.A.
17 17 N.A. 0.0GB N.A.
18 18 N.A. 0.0GB N.A.
19 19 N.A. 0.0GB N.A.
20 20 N.A. 0.0GB N.A.
21 21 WDC WD20EARS-00S8B1 2000.4GB JBOD
22 22 WDC WD20EARS-00S8B1 2000.4GB JBOD
23 23 N.A. 0.0GB N.A.
24 24 N.A. 0.0GB N.A.
===============================================================================
With this configuration disks 21 and 22 were not available to me(only 12
of the disks were available). I was using a ZFS RAIDZ3 for all of these
disks so I immediately lost 2 disks worth of redundancy. The disks
showed up in the RAID controller BIOS as well as the areca-cli(as you
can see) but /dev was minus 2 disks and a 'zpool status' showed I had 2
missing drives. As soon as I swapped cables so that the disks were back
in ports 13 and 14 on the RAID controller everything went back to normal.
Knowing that something was wrong I grabbed some spare drives and started
experimenting. I wanted to know what was actually wrong because I am
trusting this sytem with my data for production use. Please examine the
following VolumeSet Information:
VolumeSet Information
# Name Raid Name Level Capacity Ch/Id/Lun State
===============================================================================
1 WD20EARS-00S8B1 Raid Set # 00 JBOD 2000.4GB 00/00/00 Normal
2 WD20EARS-00S8B1 Raid Set # 01 JBOD 2000.4GB 00/00/01 Normal
3 WD20EARS-00S8B1 Raid Set # 02 JBOD 2000.4GB 00/00/02 Normal
4 WD20EARS-00S8B1 Raid Set # 03 JBOD 2000.4GB 00/00/03 Normal
5 WD20EARS-00S8B1 Raid Set # 04 JBOD 2000.4GB 00/00/04 Normal
6 WD20EARS-00S8B1 Raid Set # 05 JBOD 2000.4GB 00/00/05 Normal
7 WD20EARS-00S8B1 Raid Set # 06 JBOD 2000.4GB 00/00/06 Normal
8 WD20EARS-00S8B1 Raid Set # 07 JBOD 2000.4GB 00/00/07 Normal
9 WD20EARS-00S8B1 Raid Set # 08 JBOD 2000.4GB 00/01/00 Normal
10 WD20EARS-00S8B1 Raid Set # 09 JBOD 2000.4GB 00/01/01 Normal
11 WD20EARS-00S8B1 Raid Set # 10 JBOD 2000.4GB 00/01/02 Normal
12 WD20EARS-00S8B1 Raid Set # 11 JBOD 2000.4GB 00/01/03 Normal
13 WD20EARS-00S8B1 Raid Set # 12 JBOD 2000.4GB 00/01/04 Normal
14 WD20EARS-00S8B1 Raid Set # 13 JBOD 2000.4GB 00/01/05 Normal
===============================================================================
GuiErrMsg<0x00>: Success.
This is my normal configuration and all disks work. After experimenting
it turns out that if I want to use ports 1 through 8 I MUST have a disk
in port 1. For ports 9 through 16 I MUST have a disk in port 9. For
ports in 17-24 I MUST have a disk in port 17. It appears there may be
something special to CH/ID/LUN=XX/XX/00. If there is no disk at LUN=00
then that entire ID is not available for use by FreeBSD despite the
areca-cli properly identifying the disk.
If you look at the kern/143299:
*arcmsr fails to detect all attached drives. It may or may not
have something to do with a failed device attached and e.g. PR
148502 or 150390.*
*c.f.:*
*[root at manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;*
*# Ch# ModelName Capacity Usage*
*===============================================================================*
*1 1 N.A. 0.0GB N.A.*
*2 2 N.A. 0.0GB N.A.*
*3 3 N.A. 0.0GB N.A.*
*4 4 N.A. 0.0GB N.A.*
*5 5 N.A. 0.0GB N.A.*
*6 6 N.A. 0.0GB N.A.*
*7 7 N.A. 0.0GB N.A.*
*8 8 N.A. 0.0GB N.A.*
*9 9 ST31500341AS 1500.3GB JBOD*
*10 10 N.A. 0.0GB N.A.*
*11 11 ST31500341AS 1500.3GB JBOD*
*12 12 ST31500341AS 1500.3GB JBOD*
*13 13 ST31500341AS 1500.3GB JBOD*
*14 14 N.A. 0.0GB N.A.*
*15 15 ST31500341AS 1500.3GB JBOD*
*16 16 ST31500341AS 1500.3GB JBOD*
*17 17 N.A. 0.0GB N.A.*
*18 18 N.A. 0.0GB N.A.*
*19 19 ST31500341AS 1500.3GB JBOD*
*20 20 ST31500341AS 1500.3GB JBOD*
*21 21 ST31500341AS 1500.3GB JBOD*
*22 22 0.0GB Failed*
*23 23 ST31500341AS 1500.3GB JBOD*
*24 24 ST31500341AS 1500.3GB JBOD*
*===============================================================================*
*GuiErrMsg<0x00>: Success.*
*/dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d
/dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3
/dev/da4 /dev/da5*
*I count 11 drives attached via the arc1280ml, not including the
failed drive, and I see 6 appearing.*
*camcontrol rescan all and reboots to do help the issue. I am
running firmware 1.49.*
If you take what I observed and apply it to his post you will see that
only disks 9, 11, 12, 13, 15, and 16 would be available to the system.
So this is inline with the poster that says he has only 6 disk
available. I am writing this email in hopes that someone can find and
fix the issue. I do not have any failed disks to experiment with, but I
am convinced based on 4 hours of experimenting last night that the issue
may only involve failed disks if a disk fails in ports 1, 9 or 17.
--------------090005070002040708070202
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body text="#000000" bgcolor="#FFFFFF">
First I'd like to appologize right now if I am sending this email
and it is not being routed correctly. This is not the same as the
ticket system FreeNAS uses so I'm in new territory. I've been using
FreeNAS(FreeBSD) for about a year but I am a quick learner. If I
need to provide this information in a form other than email to fix
this issue please let me know.<br>
<br>
I believe I have found the cause for disks not being usable as seen
on <a href="http://www.freebsd.org/cgi/query-pr.cgi?pr=154299">kern/154299</a>.
Here's what I see on my system. My system uses an Areca 1280ML-24
with Firmware 1.49(latest) and uses FreeNAS 8.3.0 x64(based on
FreeBSD 8.3) with areca-cli version Version 1.84, Arclib: 300, Date:
Nov 9 2010( FreeBSD ). I found this issue when swapping out
backplanes for my hard drives.<br>
<br>
I had drives populating RAID controller ports 1 through 14. Due to
a failed backplane I switched the 2 drives that were connected to
ports 13 and 14 to 21 and 22 respectively. All of these disks are
in a ZFS RAIDZ3 zpool. Note that I have not had any problems with
ZFS scrubs or SMART long tests on these drives and they have been
running for more than a year so infant mortality is not an issue.
Also the RAID controller is in Non-RAID mode so all disks are JBOD
by default.<br>
<br>
Physical Drive Information<br>
# Ch# ModelName Capacity Usage<br>
===============================================================================<br>
1 1 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
2 2 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
3 3 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
4 4 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
5 5 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
6 6 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
7 7 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
8 8 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
9 9 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
10 10 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
11 11 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
12 12 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
13 13 N.A. 0.0GB N.A. <br>
14 14 N.A. 0.0GB N.A. <br>
15 15 N.A. 0.0GB N.A. <br>
16 16 N.A. 0.0GB N.A. <br>
17 17 N.A. 0.0GB N.A. <br>
18 18 N.A. 0.0GB N.A. <br>
19 19 N.A. 0.0GB N.A. <br>
20 20 N.A. 0.0GB N.A. <br>
21 21 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
22 22 WDC WD20EARS-00S8B1 2000.4GB JBOD <br>
23 23 N.A. 0.0GB N.A. <br>
24 24 N.A. 0.0GB N.A. <br>
===============================================================================<br>
<br>
With this configuration disks 21 and 22 were not available to
me(only 12 of the disks were available). I was using a ZFS RAIDZ3
for all of these disks so I immediately lost 2 disks worth of
redundancy. The disks showed up in the RAID controller BIOS as well
as the areca-cli(as you can see) but /dev was minus 2 disks and a
'zpool status' showed I had 2 missing drives. As soon as I swapped
cables so that the disks were back in ports 13 and 14 on the RAID
controller everything went back to normal.<br>
<br>
Knowing that something was wrong I grabbed some spare drives and
started experimenting. I wanted to know what was actually wrong
because I am trusting this sytem with my data for production use.
Please examine the following VolumeSet Information:<br>
<br>
VolumeSet Information<br>
# Name Raid Name Level Capacity Ch/Id/Lun
State <br>
===============================================================================<br>
1 WD20EARS-00S8B1 Raid Set # 00 JBOD 2000.4GB 00/00/00
Normal<br>
2 WD20EARS-00S8B1 Raid Set # 01 JBOD 2000.4GB 00/00/01
Normal<br>
3 WD20EARS-00S8B1 Raid Set # 02 JBOD 2000.4GB 00/00/02
Normal<br>
4 WD20EARS-00S8B1 Raid Set # 03 JBOD 2000.4GB 00/00/03
Normal<br>
5 WD20EARS-00S8B1 Raid Set # 04 JBOD 2000.4GB 00/00/04
Normal<br>
6 WD20EARS-00S8B1 Raid Set # 05 JBOD 2000.4GB 00/00/05
Normal<br>
7 WD20EARS-00S8B1 Raid Set # 06 JBOD 2000.4GB 00/00/06
Normal<br>
8 WD20EARS-00S8B1 Raid Set # 07 JBOD 2000.4GB 00/00/07
Normal<br>
9 WD20EARS-00S8B1 Raid Set # 08 JBOD 2000.4GB 00/01/00
Normal<br>
10 WD20EARS-00S8B1 Raid Set # 09 JBOD 2000.4GB 00/01/01
Normal<br>
11 WD20EARS-00S8B1 Raid Set # 10 JBOD 2000.4GB 00/01/02
Normal<br>
12 WD20EARS-00S8B1 Raid Set # 11 JBOD 2000.4GB 00/01/03
Normal<br>
13 WD20EARS-00S8B1 Raid Set # 12 JBOD 2000.4GB 00/01/04
Normal<br>
14 WD20EARS-00S8B1 Raid Set # 13 JBOD 2000.4GB 00/01/05
Normal<br>
===============================================================================<br>
GuiErrMsg<0x00>: Success.<br>
<br>
This is my normal configuration and all disks work. After
experimenting it turns out that if I want to use ports 1 through 8 I
MUST have a disk in port 1. For ports 9 through 16 I MUST have a
disk in port 9. For ports in 17-24 I MUST have a disk in port 17.
It appears there may be something special to CH/ID/LUN=XX/XX/00. If
there is no disk at LUN=00 then that entire ID is not available for
use by FreeBSD despite the areca-cli properly identifying the disk.<br>
<br>
If you look at the kern/143299:<br>
<br>
<blockquote>
<blockquote><b>arcmsr fails to detect all attached drives. It may
or may not have something to do with a failed device attached
and e.g. PR 148502 or 150390.</b><br>
<br>
<b>c.f.:</b><br>
<br>
<b>[root at manticore ~]# areca-cli disk info;ls /dev/da* /dev/ad*;</b><br>
<b> # Ch# ModelName Capacity Usage</b><br>
<b>===============================================================================</b><br>
<b> 1 1 N.A. 0.0GB N.A.</b><br>
<b> 2 2 N.A. 0.0GB N.A.</b><br>
<b> 3 3 N.A. 0.0GB N.A.</b><br>
<b> 4 4 N.A. 0.0GB N.A.</b><br>
<b> 5 5 N.A. 0.0GB N.A.</b><br>
<b> 6 6 N.A. 0.0GB N.A.</b><br>
<b> 7 7 N.A. 0.0GB N.A.</b><br>
<b> 8 8 N.A. 0.0GB N.A.</b><br>
<b> 9 9 ST31500341AS 1500.3GB JBOD</b><br>
<b> 10 10 N.A. 0.0GB N.A.</b><br>
<b> 11 11 ST31500341AS 1500.3GB JBOD</b><br>
<b> 12 12 ST31500341AS 1500.3GB JBOD</b><br>
<b> 13 13 ST31500341AS 1500.3GB JBOD</b><br>
<b> 14 14 N.A. 0.0GB N.A.</b><br>
<b> 15 15 ST31500341AS 1500.3GB JBOD</b><br>
<b> 16 16 ST31500341AS 1500.3GB JBOD</b><br>
<b> 17 17 N.A. 0.0GB N.A.</b><br>
<b> 18 18 N.A. 0.0GB N.A.</b><br>
<b> 19 19 ST31500341AS 1500.3GB JBOD</b><br>
<b> 20 20 ST31500341AS 1500.3GB JBOD</b><br>
<b> 21 21 ST31500341AS 1500.3GB JBOD</b><br>
<b> 22 22 0.0GB Failed</b><br>
<b> 23 23 ST31500341AS 1500.3GB JBOD</b><br>
<b> 24 24 ST31500341AS 1500.3GB JBOD</b><br>
<b>===============================================================================</b><br>
<b>GuiErrMsg<0x00>: Success.</b><br>
<b>/dev/ad4 /dev/ad4s1 /dev/ad4s1a /dev/ad4s1b /dev/ad4s1d
/dev/da0 /dev/da1 /dev/da1p1 /dev/da1p9 /dev/da2 /dev/da3
/dev/da4 /dev/da5</b><br>
<br>
<b>I count 11 drives attached via the arc1280ml, not including
the failed drive, and I see 6 appearing.</b><br>
<br>
<b>camcontrol rescan all and reboots to do help the issue. I am
running firmware 1.49.</b><br>
</blockquote>
</blockquote>
<br>
If you take what I observed and apply it to his post you will see
that only disks 9, 11, 12, 13, 15, and 16 would be available to the
system. So this is inline with the poster that says he has only 6
disk available. I am writing this email in hopes that someone can
find and fix the issue. I do not have any failed disks to
experiment with, but I am convinced based on 4 hours of
experimenting last night that the issue may only involve failed
disks if a disk fails in ports 1, 9 or 17.<br>
<br>
<br>
</body>
</html>
--------------090005070002040708070202--
More information about the freebsd-bugs
mailing list