The need for initialising disks before use?

Sat Aug 19 02:52:16 UTC 2006

On Fri, Aug 18, 2006 at 01:41:27PM -1000, Antony Mawer wrote:
> On 18/08/2006 4:29 AM, Brooks Davis wrote:
> >On Fri, Aug 18, 2006 at 09:19:04AM -0500, Kirk Strauser wrote:
> >>On Thursday 17 August 2006 8:35 am, Antony Mawer wrote:
> >>
> >>>A quick question - is it recommended to initialise disks before using
> >>>them to allow the disks to map out any "bad spots" early on?
> >>Note: if you once you actually start seeing bad sectors, the drive is 
> >>almost dead.  A drive can remap a pretty large number internally, but 
> >>once that pool is exhausted (and the number of errors is still growing 
> >>exponentially), there's not a lot of life left.
> >
> >There are some exceptions to this.  The drive can not remap a sector
> >which failes to read.  You must perform a write to cause the remap to
> >occur.  If you get a hard write failure it's gameover, but read failures
> >aren't necessicary a sign the disk is hopeless.  For example, the drive
> >I've had in my laptop for most of the last year developed a three sector[0]
> >error within a week or so of arrival.  After dd'ing zeros over the
> >problem sectors the problem sectors I've had no problems.
> 
> This is what prompted it -- I've been seeing lots of drives that are 
> showing up with huge numbers of read errors - for instance:
> 
> >Aug 19 04:02:27 server kernel: ad0: FAILURE - READ_DMA 
> >status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=66293984
> >Aug 19 04:02:27 server kernel: 
> >g_vfs_done():ad0s1f[READ(offset=30796791808, length=16384)]error = 5
> >Aug 19 04:02:31 server kernel: ad0: FAILURE - READ_DMA 
> >status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=47702304
> >Aug 19 04:02:31 server kernel: 
> >g_vfs_done():ad0s1f[READ(offset=21277851648, length=16384)]error = 5
> >Aug 19 04:02:36 server kernel: ad0: FAILURE - READ_DMA 
> >status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=34943296
> >Aug 19 04:02:36 server kernel: 
> >g_vfs_done():ad0s1f[READ(offset=14745239552, length=16384)]error = 5
> >Aug 19 04:03:08 server kernel: ad0: FAILURE - READ_DMA 
> >status=51<READY,DSC,ERROR> error=40<UNCORRECTABLE> LBA=45514848
> >Aug 19 04:03:08 server kernel: 
> >g_vfs_done():ad0s1f[READ(offset=20157874176, length=16384)]error = 5
> 
> I have /var/log/messages flooded with incidents of these "FAILURE - 
> READ_DMA" messages. I've seen it on more than one machine with 
> relatively "young" drives.
> 
> I'm trying to determining of running a dd if=/dev/zero over the whole 
> drive prior to use will help reduce the incidence of this, or if it is 
> likely that these are developing after the initial install, in which 
> case this will make negligible difference...

I really don't know.  The only way I can think of to find out is to own
a large number of machine and perform an experiment.  We (the general
computing public) don't have the kind of models needed to really say
anything definitive.  Drive are too darn opaque.

> Once I do start seeing these, is there an easy way to:
> 
>     a) determine what file/directory entry might be affected?

Not easily, but this question has been asked and answered on the mailing
lists recently (I don't remember the answer, but I think there were some
ports that can help).

>     b) dd if=/dev/zero over the affected sectors only, in order to
>          trigger a sector remapping without nuking the whole drive

You can use src/tools/tools/recover disk to refresh all of the disk
except the parts that don't work and then use dd and the console error
output to do the rest.

>     c) depending on where that sector is allocated, I presume I'm
>          either going to end up with:
>         i) zero'd bytes within a file (how can I tell which?!)
>        ii) a destroyed inode
>       iii) ???

Presumably it will be one of i, ii or a mangled superblock.  I don't
know how you'd tell which off the top of my head.  This is one of the
reasons I think Sun is on the right track with zfs's checksum everything
approach.  At least that way you actually know when something goes
wrong.

> Any thoughts/comments/etc appreciated...
> 
> How do other operating systems handle this - Windows, Linux, Solaris, 
> MacOSX ...? I would have hoped this would be a condition the OS would 
> make some attempt to trigger a sector remap... or are OSes typically 
> ignorant of such things?

The OS is generally unaware of such events except to the extent that 
they know a fatal read error occurred or that they read the SMART data
from the drive in the case of write failures.

-- Brooks
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 187 bytes
Desc: not available
Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20060819/98c1a3db/attachment.pgp