Failure in SATA RAID1 during heavy disk activity (amd64)

Toomas Aas toomas.aas at raad.tartu.ee
Sun Dec 10 15:52:41 PST 2006


[Please Cc: any responses, I'm not currently subscribed to the list]

Hardware:
Intel SE7230NH1-E motherboard, integrated SATA RAID (ICH7)
Two 500 GB HDDs set up as RAID1
Intel Pentium D 945 CPU
2 GB RAM

OS:
FreeBSD mail.domain.tld 6.2-RC1 FreeBSD 6.2-RC1 #0: Fri Nov 24 23:26:25 
EET 2006 toomas at mail.domain.tld:/usr/obj/usr/src/sys/TVS/amd64
SMP-enabled kernel

Problem:

During large file transfer (ca 750000 files, total size ca 45 GB) from 
another server using gtar via ssh-pipe, kernel reported a series of ATA 
related error messages.

Note that after timestamp "Dec 10 23:59:35" even the messages themselves 
appear garbled in the log:

Dec 10 23:52:50 mail kernel: DOH! ata_alloc_request failed!
Dec 10 23:52:52 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:52:52 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:52:52 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=270864936960, length=4096)]error = 5
Dec 10 23:52:52 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=270920138752, length=16384)]error = 5
Dec 10 23:57:29 mail kernel: DOH! ata_alloc_request failed!
Dec 10 23:57:29 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:57:29 mail last message repeated 7 times
Dec 10 23:57:29 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=271459713024, length=12288)]error = 5
Dec 10 23:57:30 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=271459729408, length=12288)]error = 5
Dec 10 23:57:30 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=271459745792, length=14336)]error = 5
Dec 10 23:57:30 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=271459762176, length=16384)]error = 5
Dec 10 23:57:30 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=271459778560, length=12288)]error = 5
Dec 10 23:57:30 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=271459794944, length=12288)]error = 5
Dec 10 23:57:30 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=271459811328, length=12288)]error = 5
Dec 10 23:57:30 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=271459827712, length=12288)]error = 5
Dec 10 23:58:32 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:58:32 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=65536, 
length=2048)]error = 5
Dec 10 23:59:03 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:03 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=274533269504, length=14336)]error = 5
Dec 10 23:59:35 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITEDOH! 
ata_alloc_(offset=275request failed!139362816, length=10240)]error = 5
Dec 10 23:59:35 mail kernel:
Dec 10 23:59:35 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: FAg_vfs_done():Iar0s1g[WRITE(LoUfRfEs e-t= 
o2u7t5 1o1f2 9m2e3m1o3r6y,  ilenn gattha_rai=d_1i0n2i4t0_)r]eerqruoers t=
Dec 10 23:59:35 mail kernel: 5
Dec 10 23:59:35 mail kernel: FAILgU_vRfEs _-d oonuet() of m:eamro0rsy1 
gi[WnR IaTtEa(_orfafisde_ti=nit2_7r5e1q1u2est
Dec 10 23:59:35 mail kernel: 955904, length=10240)]error =F A5IL
Dec 10 23:59:35 mail kernel: URE - out of memory in 
ata_raid_ing_ivtf_sr_edqounee(s)t:
Dec 10 23:59:35 mail kernel: ar0s1g[WRITE(offset=27F5A1I3L9U7R8E8 8-0 
0o,u tl eonfgt hm=e1m0o2r4y0 )i]ne rartoar_ =ra i5d
Dec 10 23:59:35 mail kernel: _init_request
Dec 10 23:59:35 mail kernel: FAILURE - ogu_tv fosf_ domneem(o)r:ya ri0ns 
1agt[aW_RrIaTiEd(_oifnfiste_tr=eq2u7e5s1t3
Dec 10 23:59:35 mail kernel: 9805184, length=1228FAILURE - out of memory 
in ata_raid_init_request
Dec 10 23:59:35 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: 8)]error F=A I5LU
Dec 10 23:59:35 mail kernel: RE - out of memory in ata_raid_init_request
Dec 10 23:59:35 mail kernel: FAILUgR_Ev f-s _oduotn eo(f) memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: FAILURE :- oaurt0s 1ogf[ WmReITmEo(royf 
fisne ta=ta2_7r5a1i3d9_8i2n1i5t6_8r,e qlueensgtht=
Dec 10 23:59:35 mail kernel: 12288)]error = 5
Dec 10 23:59:35 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: FAILURE - out go_fv fmse_mdoonery( )i:na 
art0as_1rga[iWdR_IiTnEi(to_frfesqeute=st27
Dec 10 23:59:35 mail kernel: 5139837952, length=14336)]error = 5
Dec 10 23:59:35 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=275139854336, FlAILURE - out of 
emnegmtohr=y 1i4n33 6a)t]ear_rorra i=d _5in
Dec 10 23:59:35 mail kernel: it_request
Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRFIATEI(LUoRfEf s-e 
to=ut 2o7f5 1m3e9m8o7r0y7 2i0n,  altean_grtahi=d1_2i2n8i8t)_]reerqruoers t=
Dec 10 23:59:35 mail kernel: 5
Dec 10 23:59:35 mail kernel: FAIgL_UvRfEs _-d oonuet( )o:fa 
rm0esm1ogr[yW RiInT Ea(toaf_frsaeitd=_in2i7t5_1r3e9q8uest8
Dec 10 23:59:35 mail kernel: 7104, length=12288)]error = 5
Dec 10 23:59:35 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 10 23:59:35 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=275115839488, length=10240)]error = 5
Dec 10 23:59:35 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=275117283328, length=10240)]error = 5
Dec 10 23:59:35 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=275117316096, length=10240)]error = 5
Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=6144000, 
length=16384)]error = 5
Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=6160384, 
length=16384)]error = 5
Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=6176768, 
length=6144)]error = 5
Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=65536, 
length=2048)]error = 5
Dec 10 23:59:35 mail kernel: swap_pager: I/O error - pageout failed; 
blkno 34,size 4096, error 5
Dec 10 23:59:35 mail kernel: swap_pager: I/O error - pageout failed; 
blkno 35,size 4096, error 5
Dec 10 23:59:35 mail kernel: swap_pager: I/O error - pageout failed; 
blkno 36,size 4096, error 5
Dec 10 23:59:35 mail kernel: swap_pager: I/O error - pageout failed; 
blkno 37,size 4096, error 5
Dec 11 00:27:10 mail kernel: DOH! ata_alloc_request failed!
Dec 11 00:27:12 mail kernel: FAILURE - out of memory in 
ata_raid_init_request
Dec 11 00:27:12 mail last message repeated 4 times
Dec 11 00:27:12 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=299515107328, length=6144)]error = 5
Dec 11 00:27:12 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=299518115840, length=6144)]error = 5
Dec 11 00:27:12 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=299515031552, length=4096)]error = 5
Dec 11 00:27:12 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=299518132224, length=6144)]error = 5
Dec 11 00:27:12 mail kernel: 
g_vfs_done():ar0s1g[WRITE(offset=299515123712, length=6144)]error = 5


The command I was using for file transfer was:

ssh -l root oldserver.domain.tld tar -cpf - -C /storage/www . \
| gtar xpf - --delay-directory-restore


During the transfer I received one error message from gtar:

gtar: ./tallinn_linnaarhiiv/149-5-2178/TileGroup1/6-43-1.jpg: Cannot 
write: No space left on device

There was actually plenty of free space on the partition to where I was 
transferring the files.

I verified this file using md5 after the transfer completed and found 
that checksums of this file on old server and new server did not match. 
I also randomly verified some other files but didn't find any more 
checksum mismatches.

Here's the dmesg of the machine where the problem occurs:

Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-RC1 #0: Fri Nov 24 23:26:25 EET 2006
     toomas at mail.domain.tld:/usr/obj/usr/src/sys/TVS
ACPI APIC Table: <INTEL  04DT044 >
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Pentium(R) D CPU 3.40GHz (3398.30-MHz K8-class CPU)
   Origin = "GenuineIntel"  Id = 0xf64  Stepping = 4
 
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
   Features2=0xe49d<SSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,<b14>,<b15>>
   AMD Features=0x20000800<SYSCALL,LM>
   AMD Features2=0x1<LAHF>
   Cores per package: 2
real memory  = 2145849344 (2046 MB)
avail memory = 2064850944 (1969 MB)
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
  cpu0 (BSP): APIC ID:  0
  cpu1 (AP): APIC ID:  1
ioapic0: Changing APIC ID to 5
ioapic0 <Version 2.0> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <INTEL 04DT044> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
cpu1: <ACPI CPU> on acpi0
acpi_button0: <Sleep Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 28.4 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 28.5 on pci0
pci3: <ACPI PCI bus> on pcib3
em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 
0x2000-0x201f mem 0x88100000-0x8811ffff irq 17 at device 0.0 on pci3
em0: Ethernet address: 00:16:76:cc:3c:4e
uhci0: <UHCI (generic) USB controller> port 0x3080-0x309f irq 23 at 
device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0x3060-0x307f irq 19 at 
device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
uhci2: <UHCI (generic) USB controller> port 0x3040-0x305f irq 18 at 
device 29.2 on pci0
uhci2: [GIANT-LOCKED]
usb2: <UHCI (generic) USB controller> on uhci2
usb2: USB revision 1.0
uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub2: 2 ports with 2 removable, self powered
uhci3: <UHCI (generic) USB controller> port 0x3020-0x303f irq 16 at 
device 29.3 on pci0
uhci3: [GIANT-LOCKED]
usb3: <UHCI (generic) USB controller> on uhci3
usb3: USB revision 1.0
uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
ehci0: <Intel 82801GB/R (ICH7) USB 2.0 controller> mem 
0x88200400-0x882007ff irq 23 at device 29.7 on pci0
ehci0: [GIANT-LOCKED]
usb4: EHCI version 1.0
usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3
usb4: <Intel 82801GB/R (ICH7) USB 2.0 controller> on ehci0
usb4: USB revision 2.0
uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub4: 8 ports with 8 removable, self powered
pcib4: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pci4: <display, VGA> at device 4.0 (no driver attached)
em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 
0x1100-0x113f mem 0x88020000-0x8803ffff,0x88000000-0x8801ffff irq 17 at 
device 5.0 on pci4
em1: Ethernet address: 00:16:76:cc:3c:4f
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel ICH7 UDMA100 controller> port 
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x30b0-0x30bf irq 18 at device 31.1 
on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
atapci1: <Intel ICH7 SATA300 controller> port 
0x30c8-0x30cf,0x30e4-0x30e7,0x30c0-0x30c7,0x30e0-0x30e3,0x30a0-0x30af 
mem 0x88200000-0x882003ff irq 19 at device 31.2 on pci0
atapci1: AHCI Version 01.10 controller with 4 ports detected
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
ata4: <ATA channel 2> on atapci1
ata5: <ATA channel 3> on atapci1
ichsmb0: <SMBus controller> port 0x3000-0x301f irq 19 at device 31.3 on pci0
ichsmb0: [GIANT-LOCKED]
smbus0: <System Management Bus> on ichsmb0
smb0: <SMBus generic I/O> on smbus0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on 
acpi0
sio0: type 16550A
orm0: <ISA Option ROM> at iomem 0xc0000-0xcafff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounters tick every 1.000 msec
acd0: DVDR <DVDRW DRW-6S160P/PSG6> at ata0-master UDMA66
ad4: 476940MB <Seagate ST3500630AS 3.AAE> at ata2-master SATA300
ad6: 476940MB <Seagate ST3500630AS 3.AAE> at ata3-master SATA300
ar0: 476772MB <LSILogic v3 MegaRAID RAID1> status: READY
ar0: disk0 READY (master) using ad4 at ata2-master
ar0: disk1 READY (mirror) using ad6 at ata3-master
SMP: AP CPU #1 Launched!
Trying to mount root from ufs:/dev/ar0s1a


More information about the freebsd-stable mailing list