Adaptec 3210S, 4.9-STABLE, corruption when disk fails
don at SANDVINE.com
Thu Mar 31 13:00:22 PST 2005
From: Uwe Doering [mailto:gemini at geminix.org]
> Don Bowman wrote:
> > From: owner-freebsd-stable at freebsd.org
> >>From: Uwe Doering [mailto:gemini at geminix.org] ...
> >>>>Did you merge 22.214.171.124 as well? This actually should have
> >>>been one MFC
> >>Yes, merged from RELENG_4.
> >>I will post later if this happens again, but it will be
> quite a long
> >>time. The machine has 7 drives in it, there are only
> >>3 ones left old enough they might fail before I take it out
> of service
> >>(it originally had 7 1999-era IBM drives, now it has 4 2004-era
> >>seagate drives and 3 of the old IBM's.
> >>The drives have been in continuous service, so they've lead
> a pretty
> >>good life!)
> >>Thanks for the suggestion on the cam timeout, I've set that value.
> > Another drive failed and the same thing happened.
> > After the failure, the raid worked in degrade mode just
> fine, but many
> > files had been corrupted during the failure.
> > So I would suggest that this merge did not help, and the
> cam timeout
> > did not help either.
> > This is very frustrating, again I rebuild my postgresql
> install from
> > backup :(
> This is indeed unfortunate. Maybe the problem is in fact
> located neither in PostgreSQL nor in FreeBSD but in the
> controller itself. Does it have the latest firmware? The
> necessary files should be available on Adaptec's website, and
> you can use the 'raidutil' program under FreeBSD to upload
> the firmware to the controller. I have to concede, however,
> that I never did this under FreeBSD myself. If I recall
> correctly I did the upload via a DOS diskette the last time.
> If this doesn't help either you could ask Adaptec's support for help.
> You need to register the controller first, if memory serves.
The latest firmware & bios is in the controller (upgraded the
last time I had problems).
Tried adaptec support, controller is registered.
The problem is definitely not in postgresql. Files go missing
in directories that are having new entries added (e.g. I lost
a 'PG_VERSION' file). Data within the postgresql files becomes
corrupt. Since the only application running is postgresql,
and it reads/writes/fsyncs the data, its not unexpected that
it's the one that reaps the 'rewards' of the failure.
I have to believe this is either a bug in the controller,
or a problem in cam or asr.
More information about the freebsd-stable