deadlock or bad disk ? RELENG_8

Mon Jul 19 20:33:22 UTC 2010

On Mon, Jul 19, 2010 at 08:37:50AM -0400, Mike Tancsa wrote:
> At 11:34 PM 7/18/2010, Jeremy Chadwick wrote:
> >>
> >> yes, da0 is a RAID volume with 4 disks behind the scenes.
> >
> >Okay, so can you get full SMART statistics for all 4 of those disks?
> >The adjusted/calculated values for SMART thresholds won't be helpful
> >here, one will need the actual raw SMART data.  I hope the Areca CLI can
> >provide that.
> 
> I thought there was, but I cant seem to get the current smartctl to
> work with the card.
> 
> -d TYPE, --device=TYPE
>               Specifies  the  type of the device.  The valid arguments to this
>               option are ata, scsi, sat,  marvell,  3ware,N,  areca,N,  usbcy-
>               press,  usbjmicron, usbsunplus, cciss,N, hpt,L/M (or hpt,L/M/N),
>               and test.
> 
> # smartctl -a -d areca,0 /dev/arcmsr0
> smartctl 5.39.1 2010-01-28 r3054 [FreeBSD 8.1-PRERELEASE amd64] (local build)
> Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
> 
> /dev/arcmsr0: Unknown device type 'areca,0'
> =======> VALID ARGUMENTS ARE: ata, scsi, sat[,N][+TYPE],
> usbcypress[,X], usbjmicron[,x][,N], usbsunplus, 3ware,N, hpt,L/M/N,
> cciss,N, atacam, test <=======
> 
> Use smartctl -h to get a usage summary

According to the official smartctl documentation and man page, the
"areca,N" argument is only supported on Linux.  Bummer.

   Areca  SATA RAID controllers are currently supported under Linux
   only.  To look at SATA disks behind Areca RAID controllers,  use
   syntax such as:
   smartctl -a -d areca,2 /dev/sg2
   smartctl -a -d areca,3 /dev/sg3

> The latest CLI tool only gives this info
> 
> CLI> disk info drv=1
> Drive Information
> ===============================================================
> IDE Channel                        : 1
> Model Name                         : ST31000340AS
> Serial Number                      : 3QJ07F1N
> Firmware Rev.                      : SD15
> Disk Capacity                      : 1000.2GB
> Device State                       : NORMAL
> Timeout Count                      : 0
> Media Error Count                  : 0
> Device Temperature                 : 29 C
> SMART Read Error Rate              : 108(6)
> SMART Spinup Time                  : 91(0)
> SMART Reallocation Count           : 100(36)
> SMART Seek Error Rate              : 81(30)
> SMART Spinup Retries               : 100(97)
> SMART Calibration Retries          : N.A.(N.A.)
> ===============================================================
> GuiErrMsg<0x00>: Success.
> 
> CLI>  disk smart drv=1
> S.M.A.R.T Information For Drive[#01]
>   # Attribute Items                           Flag   Value  Thres  State
> ===============================================================================
>   1 Raw Read Error Rate                       0x0f     108      6  OK
>   3 Spin Up Time                              0x03      91      0  OK
>   4 Start/Stop Count                          0x32     100     20  OK
>   5 Reallocated Sector Count                  0x33     100     36  OK
>   7 Seek Error Rate                           0x0f      81     30  OK
>   9 Power-on Hours Count                      0x32      79      0  OK
>  10 Spin Retry Count                          0x13     100     97  OK
>  12 Device Power Cycle Count                  0x32     100     20  OK
> 194 Temperature                               0x22      29      0  OK
> 197 Current Pending Sector Count              0x12     100      0  OK
> 198 Off-line Scan Uncorrectable Sector Count  0x10     100      0  OK
> 199 Ultra DMA CRC Error Count                 0x3e     200      0  OK
> ===============================================================================
> GuiErrMsg<0x00>: Success.

Yeah, this isn't going to help much.  The raw SMART data isn't being
shown.  I downloaded the Areca CLI manual dated 2010/07 which doesn't
state anything other than what you've already shown.  Bummer.

> >If so, think about what would happen if heavy I/O happened on
> >both da0 and da1 at the same time.  I talk about this a bit more below.
> 
> No different than any other single disk being heavily worked.
> Again, this particular hardware configuration has been beaten about
> for a couple of years. So I am not sure why all of a sudden it would
> be not possible to do

That's a very good question, and I don't have an answer to it.  I also
would have a hard time believing that suddenly out of no where heavy I/O
would exhibit this problem.  I'm just going over possibilities.  For
example, I see that the da1 RAID volume is labelled "backup1", so if you
were storing backups there possibly the I/O degrades over time as a
result of there being more data/files, etc...  Wouldn't have seen it a
year ago, but might see it now.  Just thinking out loud.

> >situation (since you'd then be dedicating an entire disk to just swap).
> >Others may have other advice.  You mention in a later mail that the
> >ada[0-3] disks make up a ZFS pool of some sort.  You might try splitting
> >ada0 into two slices, one for swap and the other used as a pool member.
> 
> That seems like it would just move the problem you are trying to get
> me to avoid to a different set of disks. If putting swap on a raid
> array is a bad thing, I am not sure how moving it to a ZFS raid
> array will help.

The idea wasn't to move swap to ZFS (that's a bad idea from what I
remember, something about crash dumps not working in that situation).
My idea was to move swap to a dedicated partition on a disk that happens
to also be used for ZFS.  E.g.:

ada0
  ada0s1a = 20GB   = swap
  ada0s1b = 980GB  = ZFS pool
ada1      = 1000GB = ZFS pool
ada2      = 1000GB = ZFS pool
ada3      = 1000GB = ZFS pool

Again, this isn't a solution for the problem.  I'm in no way trying to
dissuade anyone from figuring out the root cause.  But quite often on
the list if someone can't get an answer to "why" they want to know what
they can do as a workaround.

There just happens to be reports of this problem going all the way back
to RELENG_6, and all the posts I've read so far have been when people
have had swap backed by some sort of RAID.

> >Again: I don't think this is necessarily a bad disk problem.  The only
> >way you'd be able to determine that would be to monitor on a per-disk
> >basis the I/O response time of each disk member on the Areca.  If the
> >CLI tools provide this, awesome.  Otherwise you'll probably need to
> >involve Areca Support.
> 
> In the past when I have had bad disks on the areca, it did catch and
> flag device timeouts.  There were no such alerts leading up to this
> situation.

Yeah, which makes it sound more like a driver issue or something.  I
really don't know what to say.  Areca does officially support FreeBSD so
they might have some ideas.

-- 
| Jeremy Chadwick                                   jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |