Troubleshooting a gmirror disk marked broken
Adam Vande More
amvandemore at gmail.com
Thu Jun 27 03:09:34 UTC 2013
On Wed, Jun 26, 2013 at 9:38 PM, Nikola Pavlović <nzp at riseup.net> wrote:
> Last night during a massive (~1 year worth :| )
> portsnap fetch
> the server went unresponsive and ssh eventually disconnected. I decided
> to leave it during the night, and, sure enough, the situation was the
> same in the morning, so I had to do a hard reset. It came back up, but
> one of the two gmirror components was marked as broken and deactivated.
> The hang happened during the 'fetching new files or ports' (~24000 of
> them, there are currently ~10000 snapshots in /var/db/portsnap) phase
> of postsnap fetch.
> /var/log/messages was completely silent during the period between the
> hang and the reset.
> Googling around I found a mention that it's possible to sometimes get a
> 'blip'[*] during busy periods, so I decided to just bite the bullet and
> reinsert the component with
> # gmirror forget gm0
> # gmirror clean ad4
> # gmirror insert gm0 ad4
> Currently it's syncing and things *seem* OK. My question is how much
> should I be worried and what could be the cause of this? Is it possible
> that ports snapshot fetching caused this, or that perhaps it was the other
> way around (a failing disk causing the machine to choke during the huge
> portsnap fetch)? How to proceed? :)
The messages log definitely shows problems with your io. The smart log of
the disks are also at least mildly concerning and indicates the drives are
in a preliminary stage of death. Some HD deaths take years to complete.
Expect random glitches and intermittent reduced performance as a continuous
degradation. You might be able to alleviate some of this by switching to
the AHCI driver and bumping up timeouts but at the end of the day 2 flaky
disks in a mirror don't inspire confidence.
Adam Vande More
More information about the freebsd-questions