Page fault, GEOM problem??

Johan Ström johan at stromnet.org
Thu Nov 17 09:19:46 PST 2005


Ok, just got this not so very nice error on a RELENG_6_0 box (built  
from sources this morning, GENERIC kernel minus drivers I dont use):

Nov 17 15:35:43 elfi kernel: subdisk10: detached
Nov 17 15:35:43 elfi kernel: ad10: detached
Nov 17 15:35:43 elfi kernel: unknown: TIMEOUT - READ_DMA retrying (1  
retry left) LBA=85720528
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Device gm0s1: provider  
ad10s1 disconnected.
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134356992, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134373376, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=134438912, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268591104, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268607488, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268623872, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=268640256, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=20151026176, length=2048)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[WRITE(offset=32299655680, length=8192)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[READ(offset=37363671552, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[READ(offset=38349087232, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[READ(offset=45453566464, length=16384)]
Nov 17 15:35:43 elfi kernel: GEOM_MIRROR: Request failed (error=6).  
ad10s1[READ(offset=54459458048, length=131072)]
Nov 17 17:59:18 elfi syslogd: kernel boot file is /boot/kernel/kernel
Nov 17 17:59:18 elfi kernel:
Nov 17 17:59:18 elfi kernel:
Nov 17 17:59:18 elfi kernel: Fatal trap 12: page fault while in  
kernel mode
Nov 17 17:59:18 elfi kernel: fault virtual address      = 0x48
Nov 17 17:59:18 elfi kernel: fault code         = supervisor read,  
page not present
Nov 17 17:59:18 elfi kernel: instruction pointer        =  
0x20:0xc0506b92
Nov 17 17:59:18 elfi kernel: stack pointer              =  
0x28:0xd56d7c9c
Nov 17 17:59:18 elfi kernel: frame pointer              =  
0x28:0xd56d7c9c
Nov 17 17:59:18 elfi kernel: code segment               = base 0x0,  
limit 0xfffff, type 0x1b
Nov 17 17:59:18 elfi kernel: = DPL 0, pres 1, def32 1, gran 1
Nov 17 17:59:18 elfi kernel: processor eflags   = interrupt enabled,  
resume, IOPL = 0
Nov 17 17:59:18 elfi kernel: current process            = 36 (swi4:  
clock sio)
Nov 17 17:59:18 elfi kernel: trap number                = 12
Nov 17 17:59:18 elfi kernel: panic: page fault
Nov 17 17:59:18 elfi kernel: Uptime: 8h55m1s

ad10 and ad6, 2 brand new Maxtor Maxline 300GB SATA, attached to a  
Promise PDC40518 SATA150 controller, makes a GEOM mirror gm0s1.
I've been running this stuff in another "test" machine (MSI K8N neo  
Platinum, KT333 chip I believe), and I havent had a single problem. I  
moved the disks/controllercard to my "real" server 24 hours ago, with  
the only apparent "problem" I seemd to have was this:

Nov 17 07:06:12 elfi kernel: xl0: transmission error: 90
Nov 17 07:06:12 elfi kernel: xl0: tx underrun, increasing tx start  
threshold to 120 bytes
Nov 17 07:06:18 elfi kernel: xl0: watchdog timeout
Nov 17 07:06:18 elfi kernel: xl0: link state changed to DOWN
Nov 17 07:06:18 elfi kernel: vlan5: link state changed to DOWN
Nov 17 07:06:20 elfi kernel: xl0: link state changed to UP
Nov 17 07:06:20 elfi kernel: vlan5: link state changed to UP

Comming and going... these problems just apperade during first 20-30  
minutes after boot, then they dissapeared totally (and yes there was  
plenty of IO on the net going on both during and after these  
messages). Sometimes i just got the first two messages and nothing  
"happened", but sometimes the watchdog message came and the network  
died for a minute or so.

Here is dmesg from last boot (directly after crash):

Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
         The Regents of the University of California. All rights  
reserved.
FreeBSD 6.0-RELEASE #0: Thu Nov 17 00:49:29 CET 2005
     johan at elfi.stromnet.org:/usr/obj/usr/src/sys/ELFI
ACPI APIC Table: <ASUS   A7V333  >
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(TM) XP 1900+ (1599.56-MHz 686-class CPU)
   Origin = "AuthenticAMD"  Id = 0x662  Stepping = 2
    
Features=0x383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE, 
MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
   AMD Features=0xc0480800<SYSCALL,MP,MMX+,3DNow+,3DNow>
real memory  = 536854528 (511 MB)
avail memory = 516014080 (492 MB)
ioapic0: Changing APIC ID to 2
ioapic0 <Version 0.2> irqs 0-23 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <ASUS A7V333> on motherboard
acpi0: Power Button (fixed)
pci_link0: <ACPI PCI Link LNKA> irq 11 on acpi0
pci_link1: <ACPI PCI Link LNKB> irq 10 on acpi0
pci_link2: <ACPI PCI Link LNKC> irq 0 on acpi0
pci_link3: <ACPI PCI Link LNKD> irq 12 on acpi0
pci_link4: <ACPI PCI Link LNKE> irq 5 on acpi0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <32-bit timer at 3.579545MHz> port 0xe408-0xe40b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <VIA 8367 (KT266/KY266x/KT333) host to PCI bridge> mem  
0xe0000000-0xe3ffffff at device 0.0 on pci0
pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci0: <multimedia, audio> at device 5.0 (no driver attached)
fwohci0: <Texas Instruments TSB43AB21/A/AI/A-EP> mem  
0xdf000000-0xdf0007ff,0xde800000-0xde803fff irq 17 at device 7.0 on pci0
fwohci0: OHCI version 1.10 (ROM=1)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:e0:18:00:00:02:7e:fe
fwohci0: Phy 1394a available S400, 1 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: <IEEE1394(FireWire) bus> on fwohci0
sbp0: <SBP-2/SCSI over FireWire> on firewire0
fwe0: <Ethernet over FireWire> on firewire0
if_fwe0: Fake Ethernet address: 02:e0:18:02:7e:fe
fwe0: Ethernet address: 02:e0:18:02:7e:fe
fwe0: if_start running deferred for Giant
fwohci0: Initiate bus reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
firewire0: bus manager 0 (me)
uhci0: <VIA 83C572 USB controller> port 0xd400-0xd41f irq 19 at  
device 9.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <VIA 83C572 USB controller> on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <VIA 83C572 USB controller> port 0xd000-0xd01f irq 16 at  
device 9.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <VIA 83C572 USB controller> on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
ehci0: <VIA VT6202 USB 2.0 controller> mem 0xde000000-0xde0000ff irq  
17 at device 9.2 on pci0
ehci0: [GIANT-LOCKED]
usb2: EHCI version 0.95
usb2: companion controllers, 2 ports each: usb0 usb1
usb2: <VIA VT6202 USB 2.0 controller> on ehci0
usb2: USB revision 2.0
uhub2: VIA EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 4 ports with 4 removable, self powered
pci0: <display, VGA> at device 12.0 (no driver attached)
atapci0: <Promise PDC40518 SATA150 controller> port 0xb400-0xb47f, 
0xb000-0xb0ff mem 0xdc000000-0xdc000fff,0xdb800000-0xdb81ffff irq 17  
at device 14.0 on pci0
ata2: <ATA channel 0> on atapci0
ata3: <ATA channel 1> on atapci0
ata4: <ATA channel 2> on atapci0
ata5: <ATA channel 3> on atapci0
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xa800-0xa87f mem  
0xdb000000-0xdb00007f irq 19 at device 16.0 on pci0
miibus0: <MII bus> on xl0
xlphy0: <3c905C 10/100 internal PHY> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:04:76:ef:c6:36
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
atapci1: <VIA 8233A UDMA133 controller> port  
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xa400-0xa40f at device 17.1 on pci0
ata0: <ATA channel 0> on atapci1
ata1: <ATA channel 1> on atapci1
uhci2: <VIA 83C572 USB controller> port 0xa000-0xa01f at device 17.2  
on pci0
uhci2: [GIANT-LOCKED]
usb3: <VIA 83C572 USB controller> on uhci2
usb3: USB revision 1.0
uhub3: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
uhci3: <VIA 83C572 USB controller> port 0x9800-0x981f irq 21 at  
device 17.3 on pci0
uhci3: [GIANT-LOCKED]
usb4: <VIA 83C572 USB controller> on uhci3
usb4: USB revision 1.0
uhub4: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub4: 2 ports with 2 removable, self powered
ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77b irq 7  
drq 3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/9 bytes threshold
ppbus0: <Parallel port bus> on ppc0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10  
on acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xccfff, 
0xd0000-0xd07ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on  
isa0
Timecounter "TSC" frequency 1599556047 Hz quality 800
Timecounters tick every 1.000 msec
acd0: CDROM <CD-ROM CDU701-F/1.0q> at ata1-master PIO4
ad6: 286188MB <Maxtor 7L300S0 BANC1G10> at ata3-master SATA150
ad10: 286188MB <Maxtor 7L300S0 BANC1G10> at ata5-master SATA150
GEOM_MIRROR: Device gm0s1 created (id=4118114647).
GEOM_MIRROR: Device gm0s1: provider ad6s1 detected.
GEOM_MIRROR: Device gm0s1: provider ad10s1 detected.
GEOM_MIRROR: Device gm0s1: provider ad6s1 activated.
GEOM_MIRROR: Device gm0s1: provider mirror/gm0s1 launched.
GEOM_MIRROR: Device gm0s1: rebuilding provider ad10s1.
Trying to mount root from ufs:/dev/mirror/gm0s1a
WARNING: / was not properly dismounted
WARNING: /tmp was not properly dismounted
WARNING: /usr was not properly dismounted
/usr: mount pending error: blocks 8076 files 28
WARNING: /var was not properly dismounted
/var: mount pending error: blocks 4508 files 2
xl0: transmission error: 90
xl0: tx underrun, increasing tx start threshold to 120 bytes
xl0: transmission error: 90
xl0: tx underrun, increasing tx start threshold to 180 bytes


The network card is the exact same model as the one I used in the  
"test" machine, didn't have any problems there..

So, any ideas what this can be? If there were a disk crash, wish I  
have a hard time believing since I ran powermax (maxtor test program)  
on both of these disk 3 weeks ago and they have been running fine w/o  
a single problem since I started using them, why didn't just GEOM  
kick in and run on the other disk? Pagefaulting is not a way to react  
if a disk goes dead..

Hope someone can help me/this problem doesn't occur any more... but I  
suppose that is to much to hope for...

Thanks
Johan



More information about the freebsd-stable mailing list