kern/134491: ZFS: Hot spares are rather cold...

Michel Bouissou michel.bouissou at bioclinica.com
Tue May 12 16:10:02 UTC 2009


>Number:         134491
>Category:       kern
>Synopsis:       ZFS: Hot spares are rather cold...
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue May 12 16:10:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Michel Bouissou
>Release:        7.2
>Organization:
Bioclinica
>Environment:
>Description:
Although ZFS offers the possibility to define devices as "spares" for MIRROR / RAIDZ / RAIDZ2 storage pools, and FreeBSD will happily accept this, such "spare" devices will *NOT* automagically take over if a RAID pool device fails.

According to http://docs.sun.com/app/docs/doc/819-5461/gcvcw?a=view , I understand that the device replacement with a spare might not be performed by the kernel ZFS module but by an external agent/daemon ?
« Automatic replacement – When a fault is received, an FMA agent examines the pool to see if it has any available hot spares. If so, it replaces the faulted device with an available spare. »

I'm unable to find such a tool in FreeBSD, at least if it exists (?) it isn't active by default. So in the current status ZFS "spares" have to be activated / deactivated manually when a disk fails or is replaced.

Not only this is suboptimal but this presents a data loss risk for people who would assume that "spares" would just do what they are intended for in all usual RAID implementations... Where they won't and will just sit there idle if a disk dies, until the admin manually activates them.

This deserves preferably a fix, but at least a prominent WARNING note...

Also, although SUN doc states « Multiple pools can share devices that are designated as hot spares », in the current FreeBSD implementation ZFS will refuse to assign to a pool a "spare" which is already assigned to another, stating the device is "busy", i.e.:

# zpool status
  pool: syspool
 state: ONLINE
 (Blah-blah)

        NAME        STATE     READ WRITE CKSUM
        syspool     ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            aacd1   ONLINE       0     0     0
            aacd2   ONLINE       0     0     0
        spares
          da15      AVAIL

(Blah-blah)

# zpool add vol01 spare da15
invalid vdev specification
use '-f' to override the following errors:
da15 is in use (r1w1e1)

# zpool add -f vol01 spare da15
invalid vdev specification
the following errors must be manually repaired:
da15 is in use (r1w1e1)
>How-To-Repeat:
Create any redundant ZFS storage pool with a spare device. Hot-remove (or manually "offline") an active device from the pool. The spare won't take over unless a manual "zpool replace <pool_name> <failed_device> <spare_device>" is issued.
>Fix:


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list