ZFS Crash
Larry Rosenman
ler at lerctr.org
Fri May 29 17:44:47 UTC 2009
On Thu, 28 May 2009, Larry Rosenman wrote:
> On Thu, 28 May 2009, Kip Macy wrote:
>
>> On Tue, May 26, 2009 at 5:04 AM, Larry Rosenman <ler at lerctr.org> wrote:
>>> On Mon, 25 May 2009, Larry Rosenman wrote:
>>>
>>>> On Mon, 25 May 2009, Larry Rosenman wrote:
>>>>
>>>>> after looking at the code, never mind the "don't call doadump", so we'll
>>>>> get the textdump.
>>>>>
>>>>> Thanks rwatson for the textdump stuff!
>>>>>
>>>> Here is current stats before we crash. Does any of this look totally
>>>> out of line?
>>>>
>>> It crashed again, but did *NOT* make it into ddb enough to do the
>>> textdump.
>>>
>>> It was hung with the backtrace (looks like the same, but I couldn't
>>> scroll the screen back).
>>>
>>> Ideas?
>>>
>>> I'm really concerned that there is a problem.
>>>
>>>
>>>
>>
>>
>> - Type of disks?
> 6 SATA Seagate 400GB (5) / 500 GB (1).
>
>
> ATA channel 0:
> Master: acd0 <Memorex DVD+-RAM 510L v1/MWS7> ATA/ATAPI revision 7
> Slave: no device present
> ATA channel 2:
> Master: ad4 <ST3400620AS/3.AAJ> SATA revision 2.x
> Slave: no device present
> ATA channel 3:
> Master: ad6 <ST3400620AS/3.AAJ> SATA revision 2.x
> Slave: no device present
> ATA channel 4:
> Master: ad8 <ST3500630AS/3.AAE> SATA revision 2.x
> Slave: no device present
> ATA channel 5:
> Master: ad10 <ST3400620AS/3.AAJ> SATA revision 2.x
> Slave: no device present
> ATA channel 6:
> Master: ad12 <ST3400620AS/3.AAJ> SATA revision 2.x
> Slave: no device present
> ATA channel 7:
> Master: ad14 <ST3400620AS/3.AAJ> SATA revision 2.x
> Slave: no device present
>>
>>
>> - Size of zpools?
> All 6.
>
> pool: vault
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> vault ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad6 ONLINE 0 0 0
> ad8 ONLINE 0 0 0
> ad10 ONLINE 0 0 0
> ad12 ONLINE 0 0 0
> ad14 ONLINE 0 0 0
> ad4s1f ONLINE 0 0 0
> ad4s1e ONLINE 0 0 0
> ad4s1d ONLINE 0 0 0
>
> errors: 10 data errors, use '-v' for a list
>
>
> pool: vault
> state: ONLINE
> status: One or more devices has experienced an error resulting in data
> corruption. Applications may be affected.
> action: Restore the file in question if possible. Otherwise restore the
> entire pool from backup.
> see: http://www.sun.com/msg/ZFS-8000-8A
> scrub: none requested
> config:
>
> NAME STATE READ WRITE CKSUM
> vault ONLINE 0 0 0
> raidz1 ONLINE 0 0 0
> ad6 ONLINE 0 0 0
> ad8 ONLINE 0 0 0
> ad10 ONLINE 0 0 0
> ad12 ONLINE 0 0 0
> ad14 ONLINE 0 0 0
> ad4s1f ONLINE 0 0 0
> ad4s1e ONLINE 0 0 0
> ad4s1d ONLINE 0 0 0
>
> errors: Permanent errors have been detected in the following files:
>
> /usr/local/sbin/p4d
> /var/db/bacula/borg-dir.conmsg
> vault/usr/obj:<0x16c3a>
> vault/usr/obj:<0x169bb>
> /usr/obj/usr/src/lib/libc/random.o
>
>>
>>
>> - Compression enabled?
> Yes.
>
>
>
Ok, it just crashed. Unfortunately, I'm at work and the box is at home.
I did have my script running every minute of that entire boot.
What I saw was a full backup running, and then we started paging, and then
the backup jobs got pager errors, and were killed.
I'm not sure what else went on, so I restarted the bacula daemons that
got killed, and was in the bacula console when it died.
I'll see if I can get a cell-phone camera shot of the console.
I'll also tar up the vmstat outputs and put them on my web server.
What other forensics should I get? Bear in mind the system is probably
locked up with no dump taken :(
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler at lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893
More information about the freebsd-current
mailing list