Large discrepancy in reported disk usage on USR partition

Thu Oct 30 17:57:48 PDT 2008

On Fri, Oct 31, 2008 at 11:15:15AM +1030, Brendan Hart wrote:
> > What you showed tells me nothing about SMART, other than the remote possibility 
> > its basing some of its decisions on the "general SMART health status", 
> > which means jack squat.  I can explain why this is if need be, but it's
> > not related to the problem you're having.
> 
> Thanks for this additional information. I hadn't understood that there was
> far more information behind the simple SMART ok/not ok reported by the PERC
> controller.

Here's an example of some attributes:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0003   178   175   021    Pre-fail  Always       -       6066
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       50
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000e   200   200   051    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   085   085   000    Old_age   Always       -       11429
 10 Spin_Retry_Count        0x0012   100   253   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0012   100   253   051    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       48
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       33
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       50
194 Temperature_Celsius     0x0022   117   100   000    Old_age   Always       -       33
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0012   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   051    Old_age   Offline      -       0

You probably now understand why having access to this information is
useful.  :-)  It's very disappointing that so many RAID controllers
don't provide a way to get at this information; the ones which do I am
very thankful for!

> > Either way, this is just one of many reasons to avoid hardware RAID
> controllers if given the choice.
> 
> I have seen some mentions of using gvinum and/or gmirror to achieve the
> goals of protection from Single Point Of Failure with a single disk, which I
> believe is the reason that most people, myself included, have specified
> Hardware RAID in their servers. Is this what you mean by avoiding Hardware
> Raid? 

More or less.  Hardware RAID has some advantages (I can dig up a mail of
mine long ago outlining what the advantages were), but a lot of the time
the controller acts as more of a hindrance than a benefit.  I personally
feel the negatives outweigh the positives, but each person has different
needs and requirements.  There are some controllers which work very well
and provide great degrees of insights (at a disk level) under FreeBSD,
and those are often what I recommend if someone wants to go that route.

I make it sound like I'm the authoritative voice for what a person
should or should not buy -- I'm not.  I predominantly rely on Intel ICHx
on-board controllers with SATA disks, because ICHx works quite well
under FreeBSD (especially with AHCI).

I personally have no experience with gmirror or gvinum, but I do have
experience with ZFS.  (I'll have a little more experience with gmirror
once I have the time to test some reported problems with gmirror and
high interrupt counts when a disk is hot-swapped).

> > I hope these are SCSI disks you're showing here, otherwise I'm not sure how the 
> > controller is able to get the primary defect count of a SATA or SAS disk.  So, 
> > assuming the numbers shown are accurate, then yes, I don't think there's any 
> > disk-level problem.
>
> Yes, they are SCSI disks. Not particularly relevant to this topic, but
> interesting: I would have thought that SAS would make the same information
> available as SCSI does, as it is a serial bus evolution of SCSI. Is this
> thinking incorrect?

I don't have any experience with SAS, so I can't comment on what
features are available on SAS.

Specifically with regards to SMART: historically, SCSI does not provide
the amount of granularity/detail with attributes as ATA/SATA does.  I do
not consider this a negative against SCSI (in case, I very much like
SCSI).  SAS might provide these details, but I don't know, as I don't
have any SAS disks.

-- 
| Jeremy Chadwick                                jdc at parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |