kern/124670: large file operation on RAID cause many GEOM errors - crash

Salik Rafiq chameeyass at hotmail.com
Tue Jun 17 13:00:08 UTC 2008


>Number:         124670
>Category:       kern
>Synopsis:       large file operation on RAID cause many GEOM errors - crash
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Tue Jun 17 13:00:07 UTC 2008
>Closed-Date:
>Last-Modified:
>Originator:     Salik Rafiq
>Release:        7.0  RELEASE
>Organization:
Chameeya S S Ltd.
>Environment:
FreeBSD ChamRAID01 7.0-RELEASE FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008     root at logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC  i386

>Description:
machine configuration:
  Celeron 800Mhz., 768MB RAM, 
  11GB IDE mount: / ad2 - motherboard IDE connection
  Sil 3512 SATA PCI card
  320GB SATA ad4
  320GB SATA ad6
   - created /dev/mirror/dat mount: /home

I have serious problems when I work with a large file or large file copies.
I have had a serious of issues with the RAID. it goes down nearly every day.
Sometimes several times each day!

Here's a extract of message when I was copying a single 1.8GB file from one SAMBA
share on the mirror to another SAMBA share on the same mirror.

Jun 17 10:22:48 ChamRAID01 kernel: xl0: transmission error: 90
Jun 17 10:22:48 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 120 bytes
Jun 17 10:41:33 ChamRAID01 kernel: xl0: transmission error: 90
Jun 17 10:41:33 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 180 bytes
Jun 17 10:54:49 ChamRAID01 kernel: xl0: transmission error: 90
Jun 17 10:54:49 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 240 bytes
Jun 17 10:55:12 ChamRAID01 kernel: xl0: transmission error: 90
Jun 17 10:55:12 ChamRAID01 kernel: xl0: tx underrun, increasing tx start threshold to 300 bytes
Jun 17 10:56:22 ChamRAID01 kernel: ad4: FAILURE - device detached
Jun 17 10:56:22 ChamRAID01 kernel: subdisk4: detached
Jun 17 10:56:22 ChamRAID01 kernel: ad4: detached
Jun 17 10:56:22 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad4 disconnected.
Jun 17 10:56:22 ChamRAID01 kernel: g_vfs_done():mirror/dat[READ(offset=267860606976, length=131072)]error = 6
Jun 17 10:56:41 ChamRAID01 kernel: ad6: FAILURE - device detached
Jun 17 10:56:41 ChamRAID01 kernel: subdisk6: detached
Jun 17 10:56:41 ChamRAID01 kernel: ad6: detached
Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad6 disconnected.
Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider mirror/dat destroyed.
Jun 17 10:56:41 ChamRAID01 kernel: GEOM_MIRROR: Device dat destroyed.
Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268466061312, length=16384)]error = 6
Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467388416, length=131072)]error = 6
Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467519488, length=131072)]error = 6
Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467650560, length=131072)]error = 6
Jun 17 10:56:41 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268467781632, length=131072)]error = 6
... lots and lots of these ...
 - the machine crash after this. Don't think it left a dump.

- when it came back the OS reported missing and bad blocks on the mirror disks.
I did a fsck and cleaned the mirror disks up. I didn't check the mirror status
but I suspect it was rebuilding.
when the fsck finished I attempted to reboot the machine by issuing a reboot.
The machine crashed - it left a core dump this time.

When it came back up, the mirror rebuilt and I tried the file copy on the console instead of from my Windows machine. And the same happened.

Jun 17 11:40:31 ChamRAID01 kernel: ad6: FAILURE - device detached
Jun 17 11:40:31 ChamRAID01 kernel: subdisk6: detached
Jun 17 11:40:31 ChamRAID01 kernel: ad6: detached
Jun 17 11:40:31 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider ad6 disconnected.
Jun 17 11:40:31 ChamRAID01 kernel: GEOM_MIRROR: Device dat: provider mirror/dat destroyed.
Jun 17 11:40:31 ChamRAID01 kernel: GEOM_MIRROR: Device dat: rebuilding provider ad4 stopped.
Jun 17 11:40:31 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=289641709568, length=131072)]error = 6
Jun 17 11:40:31 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=289641840640, length=131072)]error = 6
Jun 17 11:40:36 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=114688, length=16384)]error = 6
..again LOTS of these. I had to turn the machine off this time.

I'm thinking of removing the raid and just going with a single device
and a cron job to copy the files over to the other disk each night. At least
that would work in the meantime.

I don't have any idea what the issue is. SiL 3512 drivers perhaps? I have NOT created the mirror in the RAID card BIOS..just using JBOD. I have replaced the power supply incase it was a power issue.

Here is the boot messages:
Jun 17 10:57:51 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268309741568, length=16384)]error = 6
Jun 17 10:57:51 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268293685248, length=16384)]error = 6
Jun 17 10:57:51 ChamRAID01 kernel: g_vfs_done():mirror/dat[WRITE(offset=268297748480, length=32768)]error = 6
Jun 17 11:10:58 ChamRAID01 syslogd: kernel boot file is /boot/kernel/kernel
Jun 17 11:10:58 ChamRAID01 kernel: Copyright (c) 1992-2008 The FreeBSD Project.
Jun 17 11:10:58 ChamRAID01 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
Jun 17 11:10:58 ChamRAID01 kernel: The Regents of the University of California. All rights reserved.
Jun 17 11:10:58 ChamRAID01 kernel: FreeBSD is a registered trademark of The FreeBSD Foundation.
Jun 17 11:10:58 ChamRAID01 kernel: FreeBSD 7.0-RELEASE #0: Sun Feb 24 19:59:52 UTC 2008
Jun 17 11:10:58 ChamRAID01 kernel: root at logan.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
Jun 17 11:10:58 ChamRAID01 kernel: Timecounter "i8254" frequency 1193182 Hz quality 0
Jun 17 11:10:58 ChamRAID01 kernel: CPU: Intel Celeron (768.42-MHz 686-class CPU)
Jun 17 11:10:58 ChamRAID01 kernel: Origin = "GenuineIntel"  Id = 0x686  Stepping = 6
Jun 17 11:10:58 ChamRAID01 kernel: Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
Jun 17 11:10:58 ChamRAID01 kernel: real memory  = 671023104 (639 MB)
Jun 17 11:10:58 ChamRAID01 kernel: avail memory = 642785280 (613 MB)
Jun 17 11:10:58 ChamRAID01 kernel: kbd1 at kbdmux0
Jun 17 11:10:58 ChamRAID01 kernel: ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413)
Jun 17 11:10:58 ChamRAID01 kernel: hptrr: HPT RocketRAID controller driver v1.1 (Feb 24 2008 19:59:27)
Jun 17 11:10:58 ChamRAID01 kernel: acpi0: <HP HPBDD_IO> on motherboard
Jun 17 11:10:58 ChamRAID01 kernel: acpi0: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: acpi0: Power Button (fixed)
Jun 17 11:10:58 ChamRAID01 kernel: acpi0: reservation of 0, a0000 (3) failed
Jun 17 11:10:58 ChamRAID01 kernel: acpi0: reservation of 100000, 27ef0000 (3) failed
Jun 17 11:10:58 ChamRAID01 kernel: Timecounter "ACPI-safe" frequency 3579545 Hz quality 850
Jun 17 11:10:58 ChamRAID01 kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
Jun 17 11:10:58 ChamRAID01 kernel: cpu0: <ACPI CPU> on acpi0
Jun 17 11:10:58 ChamRAID01 kernel: acpi_throttle0: <ACPI CPU Throttling> on cpu0
Jun 17 11:10:58 ChamRAID01 kernel: acpi_button0: <Power Button> on acpi0
Jun 17 11:10:58 ChamRAID01 kernel: pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff,0x4000-0x407f,0x4080-0x40ff,0x5000-0x500f,0x6000-0x607f on acpi0
Jun 17 11:10:58 ChamRAID01 kernel: pci0: <ACPI PCI bus> on pcib0
Jun 17 11:10:58 ChamRAID01 kernel: agp0: <VIA 82C694X (Apollo Pro 133A) host to PCI bridge> on hostb0
Jun 17 11:10:58 ChamRAID01 kernel: agp0: aperture size is 256M
Jun 17 11:10:58 ChamRAID01 kernel: pcib1: <PCI-PCI bridge> at device 1.0 on pci0
Jun 17 11:10:58 ChamRAID01 kernel: pci1: <PCI bus> on pcib1
Jun 17 11:10:58 ChamRAID01 kernel: vgapci0: <VGA-compatible display> port 0x9000-0x90ff mem 0xd6000000-0xd6ffffff,0xd5000000-0xd5000fff irq 12 at device 0.0 on pci1
Jun 17 11:10:58 ChamRAID01 kernel: isab0: <PCI-ISA bridge> at device 4.0 on pci0
Jun 17 11:10:58 ChamRAID01 kernel: isa0: <ISA bus> on isab0
Jun 17 11:10:58 ChamRAID01 kernel: atapci0: <VIA 82C686A UDMA66 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xa000-0xa00f at device 4.1 on pci0
Jun 17 11:10:58 ChamRAID01 kernel: ata0: <ATA channel 0> on atapci0
Jun 17 11:10:58 ChamRAID01 kernel: ata0: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: ata1: <ATA channel 1> on atapci0
Jun 17 11:10:58 ChamRAID01 kernel: ata1: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: uhci0: <VIA 83C572 USB controller> port 0xa400-0xa41f irq 10 at device 4.2 on pci0
Jun 17 11:10:58 ChamRAID01 kernel: uhci0: [GIANT-LOCKED]
Jun 17 11:10:58 ChamRAID01 kernel: uhci0: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: usb0: <VIA 83C572 USB controller> on uhci0
Jun 17 11:10:58 ChamRAID01 kernel: usb0: USB revision 1.0
Jun 17 11:10:58 ChamRAID01 kernel: uhub0: <VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0
Jun 17 11:10:58 ChamRAID01 kernel: uhub0: 2 ports with 2 removable, self powered
Jun 17 11:10:58 ChamRAID01 kernel: pci0: <bridge> at device 4.4 (no driver attached)
Jun 17 11:10:58 ChamRAID01 kernel: pci0: <multimedia, audio> at device 4.5 (no driver attached)
Jun 17 11:10:58 ChamRAID01 kernel: xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xb800-0xb87f mem 0xd8000000-0xd800007f irq 10 at device 5.0 on pci0
Jun 17 11:10:58 ChamRAID01 kernel: miibus0: <MII bus> on xl0
Jun 17 11:10:58 ChamRAID01 kernel: xlphy0: <3c905C 10/100 internal PHY> PHY 24 on miibus0
Jun 17 11:10:58 ChamRAID01 kernel: xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
Jun 17 11:10:58 ChamRAID01 kernel: xl0: Ethernet address: 00:50:da:38:c1:2c
Jun 17 11:10:58 ChamRAID01 kernel: xl0: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: atapci1: <SiI SiI 3512 SATA150 controller> port 0xbc00-0xbc07,0xc000-0xc003,0xc400-0xc407,0xc800-0xc803,0xcc00-0xcc0f mem 0xd8001000-0xd80011ff irq 11 at device 6.0 on pci0
Jun 17 11:10:58 ChamRAID01 kernel: atapci1: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: ata2: <ATA channel 0> on atapci1
Jun 17 11:10:58 ChamRAID01 kernel: ata2: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: ata3: <ATA channel 1> on atapci1
Jun 17 11:10:58 ChamRAID01 kernel: ata3: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0
Jun 17 11:10:58 ChamRAID01 kernel: fdc0: [FILTER]
Jun 17 11:10:58 ChamRAID01 kernel: fd0: <1440-KB 3.5" drive> on fdc0 drive 0
Jun 17 11:10:58 ChamRAID01 kernel: sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
Jun 17 11:10:58 ChamRAID01 kernel: sio0: type 16550A
Jun 17 11:10:58 ChamRAID01 kernel: sio0: [FILTER]
Jun 17 11:10:58 ChamRAID01 kernel: sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
Jun 17 11:10:58 ChamRAID01 kernel: sio1: type 16550A
Jun 17 11:10:58 ChamRAID01 kernel: sio1: [FILTER]
Jun 17 11:10:58 ChamRAID01 kernel: atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
Jun 17 11:10:58 ChamRAID01 kernel: atkbd0: <AT Keyboard> irq 1 on atkbdc0
Jun 17 11:10:58 ChamRAID01 kernel: kbd0 at atkbd0
Jun 17 11:10:58 ChamRAID01 kernel: atkbd0: [GIANT-LOCKED]
Jun 17 11:10:58 ChamRAID01 kernel: atkbd0: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: pmtimer0 on isa0
Jun 17 11:10:58 ChamRAID01 kernel: orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xcc000-0xcc7ff,0xcd000-0xd17ff pnpid ORM0000 on isa0
Jun 17 11:10:58 ChamRAID01 kernel: ppc0: <Parallel port> at port 0x378-0x37f irq 7 on isa0
Jun 17 11:10:58 ChamRAID01 kernel: ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
Jun 17 11:10:58 ChamRAID01 kernel: ppc0: FIFO with 16/16/8 bytes threshold
Jun 17 11:10:58 ChamRAID01 kernel: ppbus0: <Parallel port bus> on ppc0
Jun 17 11:10:58 ChamRAID01 kernel: ppbus0: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: lpt0: <Printer> on ppbus0
Jun 17 11:10:58 ChamRAID01 kernel: lpt0: Interrupt-driven port
Jun 17 11:10:58 ChamRAID01 kernel: ppi0: <Parallel I/O> on ppbus0
Jun 17 11:10:58 ChamRAID01 kernel: plip0: <PLIP network interface> on ppbus0
Jun 17 11:10:58 ChamRAID01 kernel: ppc0: [GIANT-LOCKED]
Jun 17 11:10:58 ChamRAID01 kernel: ppc0: [ITHREAD]
Jun 17 11:10:58 ChamRAID01 kernel: sc0: <System console> at flags 0x100 on isa0
Jun 17 11:10:58 ChamRAID01 kernel: sc0: VGA <16 virtual consoles, flags=0x300>
Jun 17 11:10:58 ChamRAID01 kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Jun 17 11:10:58 ChamRAID01 kernel: uhub1: <ALCOR Generic USB Hub, class 9/0, rev 1.10/3.12, addr 2> on uhub0
Jun 17 11:10:58 ChamRAID01 kernel: uhub1: 4 ports with 4 removable, self powered
Jun 17 11:10:58 ChamRAID01 kernel: ukbd0: <CHESEN USB Keyboard, class 0/0, rev 1.10/1.10, addr 3> on uhub1
Jun 17 11:10:58 ChamRAID01 kernel: kbd2 at ukbd0
Jun 17 11:10:58 ChamRAID01 kernel: uhid0: <CHESEN USB Keyboard, class 0/0, rev 1.10/1.10, addr 3> on uhub1
Jun 17 11:10:58 ChamRAID01 kernel: ums0: <vendor 0x062a product 0x0000, class 0/0, rev 1.10/0.00, addr 4> on uhub1
Jun 17 11:10:58 ChamRAID01 kernel: ums0: 3 buttons and Z dir.
Jun 17 11:10:58 ChamRAID01 kernel: uhid1: <No brand SP04-A1, class 0/0, rev 1.10/1.00, addr 5> on uhub1
Jun 17 11:10:58 ChamRAID01 kernel: uhid2: <No brand SP04-A1, class 0/0, rev 1.10/1.00, addr 5> on uhub1
Jun 17 11:10:58 ChamRAID01 kernel: uhid2: unexpected endpoint
Jun 17 11:10:58 ChamRAID01 kernel: device_attach: uhid2 attach returned 6
Jun 17 11:10:58 ChamRAID01 kernel: Timecounter "TSC" frequency 768417488 Hz quality 800
Jun 17 11:10:58 ChamRAID01 kernel: Timecounters tick every 1.000 msec
Jun 17 11:10:58 ChamRAID01 kernel: hptrr: no controller detected.
Jun 17 11:10:58 ChamRAID01 kernel: acd0: CDRW <PHILIPS CDRW1610A/0.010000> at ata0-slave UDMA33
Jun 17 11:10:58 ChamRAID01 kernel: ad2: 9773MB <FUJITSU MPF3102AT 0028> at ata1-master UDMA66
Jun 17 11:10:58 ChamRAID01 kernel: ad4: 305245MB <WDC WD3200AAKS-00B3A0 01.03A01> at ata2-master SATA150
Jun 17 11:10:58 ChamRAID01 kernel: ad6: 305245MB <WDC WD3200AAKS-00B3A0 01.03A01> at ata3-master SATA150
Jun 17 11:10:58 ChamRAID01 kernel: GEOM_MIRROR: Device mirror/dat launched (1/2).
Jun 17 11:10:58 ChamRAID01 kernel: GEOM_MIRROR: Device dat: rebuilding provider ad4.
Jun 17 11:10:58 ChamRAID01 kernel: Trying to mount root from ufs:/dev/ad2s1a
Jun 17 11:10:58 ChamRAID01 kernel: WARNING: / was not properly dismounted
Jun 17 11:10:58 ChamRAID01 savecore: reboot after panic: page fault
Jun 17 11:10:58 ChamRAID01 savecore: writing core to vmcore.2

Hope someone can help me out.
>How-To-Repeat:
Copy or manipulate a large zip or similar file on the RAID device. This can be either done from SAMBA, NFS or on the machine itself.
>Fix:
none.

>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list