Gmirror, broken ggate locks system
Cory Marsh
cory at clearwateranalytics.com
Wed Jun 6 15:17:23 UTC 2007
I am experiencing a gmirror issue on my mirrored partitions. These
partitions work great replicating data over a gmirror interface to
another machine. Everything goes just fine until the ggate interface in
the gmirror goes down (backup machine reboot, network problem, etc). At
that point, the machine with the gmirror locks up. Any process that is
currently running will continue to run, so long as it does not access
the disk in anyway. As soon as a disk request happens that process
locks hard. This forces me to shutdown the machine ungracefully.
Is this the expected behavior? Shouldn't gmirror detect the stale
(unresponsive) component and deactivate it? Is it a problem because my
primary consumer is the ggate device? Is there a better configuration
to achieve the same result?
Any ideas/suggestions would be appreciated. Thanks!
-Cory
%uname -a
FreeBSD cwanfs1.arbfund.com 6.2-RELEASE FreeBSD 6.2-RELEASE #0: Fri Jan
12 23:34:43 MST 2007 root@:/usr/obj/usr/src/sys/GENERIC amd64
%gmirror list data
Geom name: data
State: COMPLETE
Components: 2
Balance: prefer
Slice: 4096
Flags: NOAUTOSYNC
GenID: 2
SyncID: 21
ID: 1381569007
Providers:
1. Name: mirror/data
Mediasize: 10737417728 (10G)
Sectorsize: 512
Mode: r1w1e1
Consumers:
1. Name: ggate0
Mediasize: 10737418240 (10G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 1
Flags: NONE
GenID: 2
SyncID: 21
ID: 1578386556
2. Name: ar0s1g
Mediasize: 10737418240 (10G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 100
Flags: NONE
GenID: 2
SyncID: 21
ID: 1982490913
Info about the problem that locked the machine (got these messages for
20 minutes, about 100 of them, before the machine locked, it could have
been locked after the first message, I only noticed the machine down
after 20 minutes). It looks like a network card issue disconnected the
ggate devices and then the machine locked.
/var/log/messages:
...
Jun 5 17:10:15 cwanfs1 kernel: nfe0: watchdog timeout (missed Tx
interrupts) -- recovering
Jun 5 17:10:28 cwanfs1 ggatec: Lost connection 1.
Jun 5 17:10:28 cwanfs1 ggatec: Disconnected [10.10.10.2 /dev/ar0s1g].
Connecting...
Jun 5 17:10:59 cwanfs1 kernel: nfe0: watchdog timeout (missed Tx
interrupts) -- recovering
...
%dmesg
Copyright (c) 1992-2007 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights
reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 6.2-RELEASE #0: Fri Jan 12 23:34:43 MST 2007
root@:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 X2 Dual Core Processor 5600+ (2800.13-MHz
K8-class CPU)
Origin = "AuthenticAMD" Id = 0x40f33 Stepping = 3
Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE
,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,
HTT>
Features2=0x2001<SSE3,CX16>
AMD Features=0xea500800<SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow+,3DNow>
AMD Features2=0x1f<LAHF,CMP,<b2>,<b3>,CR8>
Cores per package: 2
real memory = 2147287040 (2047 MB)
avail memory = 2065846272 (1970 MB)
ACPI APIC Table: <A M I OEMAPIC >
ioapic0 <Version 1.1> irqs 0-23 on motherboard
kbd1 at kbdmux0
acpi0: <A M I OEMXSDT> on motherboard
acpi0: Power Button (fixed)
acpi0: reservation of fec00000, 1000 (3) failed
acpi0: reservation of fee00000, 1000 (3) failed
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x2008-0x200b on acpi0
cpu0: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
pci0: <memory, RAM> at device 0.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
pci0: <serial bus, SMBus> at device 1.1 (no driver attached)
ohci0: <OHCI (generic) USB controller> mem 0xfeaf7000-0xfeaf7fff irq 21
at device 2.0 on pci0
ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: SMM does not respond, resetting
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 10 ports with 10 removable, self powered
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfeaf6c00-0xfeaf6cff irq
22 at device 2.1 on pci0
ehci0: [GIANT-LOCKED]
usb1: EHCI version 1.0
usb1: companion controller, 10 ports each: usb0
usb1: <EHCI (generic) USB 2.0 controller> on ehci0
usb1: USB revision 2.0
uhub1: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub1: 10 ports with 10 removable, self powered
atapci0: <nVidia nForce MCP55 UDMA133 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 4.0 o
n pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
atapci1: <nVidia nForce MCP55 SATA300 controller> port
0xd480-0xd487,0xd400-0xd403,0xd080-0xd087,0xd000-0xd003,0xcc00-0x
cc0f mem 0xfeaf5000-0xfeaf5fff irq 23 at device 5.0 on pci0
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
atapci2: <nVidia nForce MCP55 SATA300 controller> port
0xc880-0xc887,0xc800-0xc803,0xc480-0xc487,0xc400-0xc403,0xc080-0x
c08f mem 0xfeaf4000-0xfeaf4fff irq 20 at device 5.1 on pci0
ata4: <ATA channel 0> on atapci2
ata5: <ATA channel 1> on atapci2
atapci3: <nVidia nForce MCP55 SATA300 controller> port
0xc000-0xc007,0xbc00-0xbc03,0xb880-0xb887,0xb800-0xb803,0xb480-0x
b48f mem 0xfeaf3000-0xfeaf3fff irq 21 at device 5.2 on pci0
ata6: <ATA channel 0> on atapci3
ata7: <ATA channel 1> on atapci3
pcib1: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci1: <display, VGA> at device 10.0 (no driver attached)
nfe0: <NVIDIA nForce MCP55 Networking Adapter> port 0xb400-0xb407 mem
0xfeaf2000-0xfeaf2fff,0xfeaf6800-0xfeaf68ff,0xfeaf
6400-0xfeaf640f irq 22 at device 8.0 on pci0
miibus0: <MII bus> on nfe0
ukphy0: <Generic IEEE 802.3u media interface> on miibus0
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
nfe0: Ethernet address: 00:e0:81:75:4d:fc
nfe0: [FAST]
nfe1: <NVIDIA nForce MCP55 Networking Adapter> port 0xb080-0xb087 mem
0xfeaf1000-0xfeaf1fff,0xfeaf6000-0xfeaf60ff,0xfeaf
0c00-0xfeaf0c0f irq 23 at device 9.0 on pci0
miibus1: <MII bus> on nfe1
ukphy1: <Generic IEEE 802.3u media interface> on miibus1
ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
nfe1: Ethernet address: 00:e0:81:75:4d:fd
nfe1: [FAST]
pcib2: <ACPI PCI-PCI bridge> at device 10.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 11.0 on pci0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 12.0 on pci0
pci4: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 13.0 on pci0
pci5: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 14.0 on pci0
pci6: <ACPI PCI bus> on pcib6
pcib7: <ACPI PCI-PCI bridge> at device 15.0 on pci0
pci7: <ACPI PCI bus> on pcib7
acpi_button0: <Power Button> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse Explorer, device ID 4
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc97ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on
isa0
Timecounter "TSC" frequency 2800129550 Hz quality 800
Timecounters tick every 1.000 msec
acd0: CDROM <CD-224E-N/1.AA> at ata0-slave UDMA33
ad4: 305245MB <Seagate ST3320620NS 3.AEG> at ata2-master SATA300
ad6: 305245MB <Seagate ST3320620NS 3.AEG> at ata3-master SATA300
ar0: 305245MB <nVidia MediaShield RAID1> status: READY
ar0: disk0 READY (master) using ad4 at ata2-master
ar0: disk1 READY (mirror) using ad6 at ata3-master
More information about the freebsd-geom
mailing list