Disappointed-new

Albert Shih shih at math.jussieu.fr
Fri Apr 7 12:15:10 UTC 2006


Hi all

I've sent a message two days ago with "Disappointed" subject.

Many of you answer me I don't have describe the bug.

Well, first my english is very bad, and second I don't blame anyone, never
the developper. Personnaly I'm very impresse by the work you doing.

Now a fine description of my problem.

Hardware :

	HP Proliant ML 350 G4
	2 x Xeon 3.2 Ghz (HT disable)
	1 Go Ram
	1 bge network interface on mothercard
	1 dual 1000Mbits/s (em chipset)
	1 dual 100Mbits/s (fxp chipset)
	1 Internal 641 Smart Array with 2 Hotplug disk in raid 1
	1 MSA1000 with 14 disk on fiber channel attachement

Situation :

	On every network card we have different IP subnet
	
	Every network card have he's owne IP address

	All interface is connected on Foundry 1000 Mbits/s switch L2


Purpose :

	It's central nfs server (NetApp for «small» budget...), the server
	run a dhcpd server and that's all. 

	There are no user account on this server.

	The server is not routed on Internet. 

	The nfs is bind on 3 of 5 IP number (the 2 other is just running
	ssh for scp)

	There are 13 nfs clients running Linux (different version of
	kernel)

	There are also 4 nfs clients running FreeBSD (different version but
	all > 5.2 and < 5.5)


In the time :

	The 6-stable is installed on the server on begin of February 2006

Problems :
	

First time :

	Kernel : SMP+ipfw

	In first time the «main» nfsd is bind on the bge0 interface
	(main=90% nfs traffic)

	After 10-15 days of perfect running, the bge0 don't work, but the
	other interface working perfectly. The server is up and the other
	nfs clients can acces without problem the nfs partition. I can
	logon the console. And I've try many
		ifconfig bge0 down
		ifconfig bge up
		ifconfig bge0 delete
		etc...
	nothing work

	On the console the are repeatly message like

		bge0 watchdog timeout problems
		bge0 watchdog timeout problems

	Only (for me) reboot can make the system re-work. And after reboot
	everything work fine.

	But after some days the problem is come again. And in this second case all
	interfaces don't work. But the I always can logon in the console.
	But the reboot is not clean (I need to make a big fsck)

Second time :

	Kernel : NO_SMP +ipfw

	After some advice on this mailing-list I switch to mono-proc
	version of the kernel.

	This time after some days working fine the bge0 don't work again
	(same condition of first time)

third time :

	Kernel : NO_SMP + ipfw

	I switch the main nfs(=90% of traffic) interface to em0 and put a
	not running nfs (only scp) ip number on the bge0.

	Again after some days the .... em0 interface don't work. And this
	time the message on console is
		em0 watchdog timeout problems

	sometime I have fxpX watchdog timeout problem too

forth time :

	Kernel : NO_SMP + polling + ipfw

	Now I'm running all interface in polling mode. And...I hope it's
	work...(running from 2 days).

Information :

	I can't tell if it's during heavy nfs load, but I really don't
	think. There are on crash during saturday (and we don't have many
	users in this day).

	I cannot reproduce this bug. I've try to make a big nfs access (on
	4 linux clients I'm running in same time something like

		find . -type f -exec md5sum {} \;

	but he won't crash. In this partition there are 30 Go. 

I forget to tell I'm running a very close configuration (a old ML350G3 with
same MSA1000 in same condition) with 4.x during 4 years without any crash
(with the same clients etc...)

In attachement the dmesg just after the server boot.

Next Monday I switch to DB kernel but now I just can reboot the server (600
users).

I hope that's can help you to make FreeBSD better than best OS ;-) .

Lots of thanks.

--
Albert SHIH
Universite de Paris 7 (Denis DIDEROT)
U.F.R. de Mathematiques.
Heure local/Local time:
Fri Apr 7 13:38:55 CEST 2006
-------------- next part --------------
Copyright (c) 1992-2006 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 6.1-PRERELEASE #1: Wed Apr  5 17:27:03 CEST 2006
    root at nfs3.math.jussieu.fr:/usr/obj/usr/src/sys/NFS3-mono
ACPI APIC Table: <HP     00000083>
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Xeon(TM) CPU 3.20GHz (3200.13-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0xf41  Stepping = 1
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x641d<SSE3,RSVD2,MON,DS_CPL,CNTX-ID,CX16,<b14>>
  AMD Features=0x20000000<LM>
  Hyperthreading: 2 logical CPUs
real memory  = 1073688576 (1023 MB)
avail memory = 1041752064 (993 MB)
ioapic1: Changing APIC ID to 9
ioapic0 <Version 2.0> irqs 0-23 on motherboard
ioapic1 <Version 2.0> irqs 24-47 on motherboard
ioapic2 <Version 2.0> irqs 48-71 on motherboard
ioapic3 <Version 2.0> irqs 72-95 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <HP D17> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-safe" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x908-0x90b on acpi0
cpu0: <ACPI CPU> on acpi0
pcib0: <ACPI Host-PCI bridge> on acpi0
pci0: <ACPI PCI bus> on pcib0
pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0
pci5: <ACPI PCI bus> on pcib1
pcib2: <ACPI PCI-PCI bridge> at device 0.0 on pci5
pci6: <ACPI PCI bus> on pcib2
isp0: <Qlogic ISP 2312 PCI FC-AL Adapter> port 0x6000-0x60ff mem 0xfdef0000-0xfdef0fff irq 48 at device 1.0 on pci6
isp0: [GIANT-LOCKED]
pcib3: <ACPI PCI-PCI bridge> at device 0.2 on pci5
pci9: <ACPI PCI bus> on pcib3
em0: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port 0x7000-0x703f mem 0xfdfe0000-0xfdffffff,0xfdf80000-0xfdfbffff irq 76 at device 1.0 on pci9
em0: Ethernet address: 00:11:0a:56:57:9e
em1: <Intel(R) PRO/1000 Network Connection Version - 3.2.18> port 0x7040-0x707f mem 0xfdf60000-0xfdf7ffff irq 77 at device 1.1 on pci9
em1: Ethernet address: 00:11:0a:56:57:9f
ciss0: <HP Smart Array 641> port 0x7400-0x74ff mem 0xfdf50000-0xfdf51fff,0xfdf00000-0xfdf3ffff irq 72 at device 2.0 on pci9
ciss0: [GIANT-LOCKED]
pcib4: <ACPI PCI-PCI bridge> at device 4.0 on pci0
pci13: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 6.0 on pci0
pci16: <ACPI PCI bus> on pcib5
pcib6: <ACPI PCI-PCI bridge> at device 28.0 on pci0
pci2: <ACPI PCI bus> on pcib6
pcib7: <PCI-PCI bridge> at device 2.0 on pci2
pci3: <PCI bus> on pcib7
fxp0: <Intel 82559 Pro/100 Ethernet> port 0x5000-0x503f mem 0xfddf0000-0xfddf0fff,0xfdc00000-0xfdcfffff irq 26 at device 4.0 on pci3
miibus0: <MII bus> on fxp0
inphy0: <i82555 10/100 media interface> on miibus0
inphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp0: Ethernet address: 00:08:02:cd:d5:be
fxp1: <Intel 82559 Pro/100 Ethernet> port 0x5040-0x507f mem 0xfdbf0000-0xfdbf0fff,0xfda00000-0xfdafffff irq 26 at device 5.0 on pci3
miibus1: <MII bus> on fxp1
inphy1: <i82555 10/100 media interface> on miibus1
inphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
fxp1: Ethernet address: 00:08:02:cd:d5:bf
mpt0: <LSILogic 1030 Ultra4 Adapter> port 0x4000-0x40ff mem 0xfd9e0000-0xfd9fffff,0xfd9c0000-0xfd9dffff irq 24 at device 3.0 on pci2
mpt0: [GIANT-LOCKED]
mpt0: MPI Version=1.2.14.0
mpt0: Unhandled Event Notify Frame. Event 0xa.
mpt1: <LSILogic 1030 Ultra4 Adapter> port 0x4400-0x44ff mem 0xfd9a0000-0xfd9bffff,0xfd980000-0xfd99ffff irq 25 at device 3.1 on pci2
mpt1: [GIANT-LOCKED]
mpt1: MPI Version=1.2.14.0
mpt1: Unhandled Event Notify Frame. Event 0xa.
uhci0: <UHCI (generic) USB controller> port 0x2000-0x201f irq 16 at device 29.0 on pci0
uhci0: [GIANT-LOCKED]
usb0: <UHCI (generic) USB controller> on uhci0
usb0: USB revision 1.0
uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhci1: <UHCI (generic) USB controller> port 0x2020-0x203f irq 19 at device 29.1 on pci0
uhci1: [GIANT-LOCKED]
usb1: <UHCI (generic) USB controller> on uhci1
usb1: USB revision 1.0
uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pci0: <base peripheral> at device 29.4 (no driver attached)
pci0: <base peripheral, interrupt controller> at device 29.5 (no driver attached)
ehci0: <Intel 6300ESB USB 2.0 controller> mem 0xfbee0000-0xfbee03ff irq 23 at device 29.7 on pci0
ehci0: [GIANT-LOCKED]
usb2: EHCI version 1.0
usb2: companion controllers, 2 ports each: usb0 usb1
usb2: <Intel 6300ESB USB 2.0 controller> on ehci0
usb2: USB revision 2.0
uhub2: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 4 ports with 4 removable, self powered
pcib8: <ACPI PCI-PCI bridge> at device 30.0 on pci0
pci1: <ACPI PCI bus> on pcib8
bge0: <Broadcom BCM5705K Gigabit Ethernet, ASIC rev. 0x3003> mem 0xfd8f0000-0xfd8fffff irq 17 at device 2.0 on pci1
miibus2: <MII bus> on bge0
brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus2
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto
bge0: Ethernet address: 00:15:60:0b:09:b4
pci1: <display, VGA> at device 3.0 (no driver attached)
pci1: <base peripheral> at device 4.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 31.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <Intel 6300ESB UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x500-0x50f at device 31.1 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
acpi_tz0: <Thermal Zone> on acpi0
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
ppc0: <Standard parallel printer port> port 0x378-0x37f,0x778-0x77d irq 7 drq 0 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
sio0: <Standard PC COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
fdc0: <floppy drive controller (FDE)> port 0x3f2-0x3f5 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
pmtimer0 on isa0
orm0: <ISA Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc9fff,0xca000-0xcdfff,0xee000-0xeffff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 3200131784 Hz quality 800
Timecounters tick every 1.000 msec
ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging unlimited
acd0: CDROM <HL-DT-ST CD-ROM GCR-8482B/2.09> at ata1-master UDMA33
Waiting 5 seconds for SCSI devices to settle
sa0 at mpt0 bus 0 target 0 lun 0
sa0: <QUANTUM SDLT600 1E1E> Removable Sequential Access SCSI-4 device 
sa0: 160.000MB/s transfers (80.000MHz, offset 126, 16bit)
pass0 at isp0 bus 0 target 125 lun 0
pass0: <COMPAQ MSA1000 2.38> Fixed Storage Array SCSI-4 device 
pass0: 200.000MB/s transfers, Tagged Queueing Enabled
da2 at ciss0 bus 0 target 0 lun 0
da2: <COMPAQ RAID 1  VOLUME OK> Fixed Direct Access SCSI-0 device 
da2: 135.168MB/s transfers
da2: 69459MB (142253280 512 byte sectors: 255H 32S/T 17433C)
da0 at isp0 bus 0 target 125 lun 1
da0: <COMPAQ MSA1000 VOLUME 2.38> Fixed Direct Access SCSI-4 device 
da0: 200.000MB/s transfers, Tagged Queueing Enabled
da0: 555714MB (1138103296 512 byte sectors: 255H 63S/T 70843C)
da1 at isp0 bus 0 target 125 lun 2
da1: <COMPAQ MSA1000 VOLUME 2.38> Fixed Direct Access SCSI-4 device 
da1: 200.000MB/s transfers, Tagged Queueing Enabled
da1: 277850MB (569038365 512 byte sectors: 255H 63S/T 35421C)
Trying to mount root from ufs:/dev/da2s1a
em0: link state changed to UP
em1: link state changed to UP


More information about the freebsd-stable mailing list