any hope for nfe/msk?
Sam Leffler
sam at errno.com
Wed Nov 21 11:39:18 PST 2007
Don Lewis wrote:
> On 21 Nov, Chris wrote:
>
>> On 07/11/2007, Pyun YongHyeon <pyunyh at gmail.com> wrote:
>>
>>> On Wed, Nov 07, 2007 at 02:28:00PM +0200, Oleg Lomaka wrote:
>>> > Hello,
>>> >
>>> > Pyun YongHyeon wrote:
>>> > >On Thu, Nov 01, 2007 at 10:59:48AM +0200, Oleg Lomaka wrote:
>>> > > > Hello,
>>> > > >
>>> > > > Pyun YongHyeon wrote:
>>> > > > >On Tue, Oct 30, 2007 at 04:01:04PM +0200, Oleg Lomaka wrote:
>>> > > > >
>>> > > > >[...]
>>> > > > >
>>> > > > > > I had RxFIFO overrun again :(
>>> > > > > > from dmest:
>>> > > > > > msk0: Rx FIFO overrun!
>>> > > > >
>>> > > > >[...]
>>> > > > >
>>> > > > >Please try attached patch again. Sorry for the trouble.
>>> > > > >After applying the patch show me verbosed dmesg output related with
>>> > > > >msk(4)/PHY driver.
>>> > > > >
>>> > > > >Thanks for testing.
>>> > > > >
>>> > > > pcib1: <MPTable PCI-PCI bridge> irq 16 at device 28.0 on pci0
>>> > > > pcib1: domain 0
>>> > > > pcib1: secondary bus 2
>>> > > > pcib1: subordinate bus 2
>>> > > > pcib1: I/O decode 0x2000-0x2fff
>>> > > > pcib1: memory decode 0xd0100000-0xd01fffff
>>> > > > pcib1: no prefetched decode
>>> > > > pci2: <PCI bus> on pcib1
>>> > > > pci2: domain=0, physical bus=2
>>> > > > found-> vendor=0x11ab, dev=0x4352, revid=0x14
>>> > > > domain=0, bus=2, slot=0, func=0
>>> > > > class=02-00-00, hdrtype=0x00, mfdev=0
>>> > > > cmdreg=0x0007, statreg=0x4010, cachelnsz=16 (dwords)
>>> > > > lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns)
>>> > > > intpin=a, irq=11
>>> > > > powerspec 2 supports D0 D1 D2 D3 current D0
>>> > > > MSI supports 2 messages, 64 bit
>>> > > > map[10]: type Memory, range 64, base 0xd0100000, size 14, enabled
>>> > > > pcib1: requested memory range 0xd0100000-0xd0103fff: good
>>> > > > map[18]: type I/O Port, range 32, base 0x2000, size 8, enabled
>>> > > > pcib1: requested I/O range 0x2000-0x20ff: in range
>>> > > > pcib1: slot 0 INTA routed to irq 16
>>> > > > mskc0: <Marvell Yukon 88E8038 Gigabit Ethernet> port 0x2000-0x20ff mem
>>> > > > 0xd0100000-0xd0103fff irq 16 at device 0.0 on pci2
>>> > > > mskc0: Reserved 0x4000 bytes for rid 0x10 type 3 at 0xd0100000
>>> > > > mskc0: MSI count : 2
>>> > > > mskc0: RAM buffer size : 4KB
>>> > > > mskc0: Port 0 : Rx Queue 2KB(0x00000000:0x000007ff)
>>> > > > mskc0: Port 0 : Tx Queue 2KB(0x00000800:0x00000fff)
>>> > > > msk0: <Marvell Technology Group Ltd. Yukon FE Id 0xb7 Rev 0x01> on mskc0
>>> > > > msk0: bpf attached
>>> > > > msk0: Ethernet address: 00:1b:24:0e:bc:26
>>> > > > miibus0: <MII bus> on msk0
>>> > > > e1000phy0: <Marvell 88E3082 10/100 Fast Ethernet PHY> PHY 0 on miibus0
>>> > > > e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
>>> > > > ioapic0: routing intpin 16 (PCI IRQ 16) to vector 49
>>> > > > mskc0: [MPSAFE]
>>> > > > mskc0: [FILTER]
>>> > > >
>>> > >
>>> > >So far all looks good to me. If you encounter watchdog timeouts
>>> > >or Rx FIFO overruns let me know.
>>> > >
>>> > >
>>> >
>>> > Got it again:
>>> > msk0: Rx FIFO overrun!
>>> > I believe this is happening under heavy CPU usage. Now i have firefox
>>> > compiling and watched pictures on remote windows box using rdesktop. And
>>> > after few minutes got network freeze.
>>>
>>> If it only happens under heavy system loads it's probably normal. If
>>> system is too busy to serve other jobs the msk(4) may not recevie
>>> more packets because its receive buffer was full. Probably msk(4)
>>> should just count the overrun errors without printing the message
>>> such that it would save more CPU cycles.
>>> Btw, did you also see watchdog timeout errors?
>>>
>>> > But it looks i didn't get any packet lost :). Take a look at ping
>>> > statistics... funny...
>>>
>>> I guess something is wrong here. Latency is unacceptable. However
>>> I have no idea why ICMP echo reponse takes so long time. Are you
>>> using any power saving mechanism(powerd, cpufreq etc)?
>>>
>>> > tdevil% ping 10.1.1.254
>>> > PING 10.1.1.254 (10.1.1.254): 56 data bytes
>>> > 64 bytes from 10.1.1.254: icmp_seq=0 ttl=64 time=35926.404 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=1 ttl=64 time=34925.694 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=2 ttl=64 time=33924.729 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=3 ttl=64 time=32923.814 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=4 ttl=64 time=31922.833 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=5 ttl=64 time=30921.878 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=6 ttl=64 time=29920.923 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=7 ttl=64 time=28919.960 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=8 ttl=64 time=27919.009 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=9 ttl=64 time=26918.042 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=10 ttl=64 time=25917.078 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=11 ttl=64 time=24916.115 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=12 ttl=64 time=23915.144 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=13 ttl=64 time=22914.192 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=14 ttl=64 time=21913.214 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=15 ttl=64 time=20912.278 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=16 ttl=64 time=19911.330 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=17 ttl=64 time=18910.375 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=18 ttl=64 time=17909.419 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=19 ttl=64 time=16853.821 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=20 ttl=64 time=15854.710 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=21 ttl=64 time=14701.312 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=22 ttl=64 time=13701.003 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=23 ttl=64 time=12700.052 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=24 ttl=64 time=11699.098 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=25 ttl=64 time=10698.148 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=36 ttl=64 time=0.463 ms
>>> > 64 bytes from 10.1.1.254: icmp_seq=37 ttl=64 time=0.379 ms
>>> >
>>>
>>> --
>>> Regards,
>>> Pyun YongHyeon
>>> _______________________________________________
>>> freebsd-stable at freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
>>> To unsubscribe, send any mail to "freebsd-stable-unsubscribe at freebsd.org"
>>>
>>>
>> I started having problems on nfe driver now I was using on 6.2 stable
>> and I had polling enabled, the entire system was lagging and even when
>> idle. I have no upgraded the box in question to 7.0 beta 3 and
>> keeping the nfe driver on.
>>
>> irq22: nfe0 ehci0 1652548 20
>>
>> It hasnt had heavy load since the upgrade yet.
>>
>> ehci0: <EHCI (generic) USB 2.0 controller>
>>
>> I have no local access so cannot disable usb in the bios, if I do a
>> new kernel disabling ehci in the kernel config will this stop the
>> interrupt sharing and allow me to use nfe reasonably without polling
>> as I think polling itself has been causing me problems (i use nfs).
>>
>> Is nfe still getting development as these are existing problems that
>> are known but there has been no update to the below page for a while
>> now so I am curious if its dead in the water now.
>>
>> http://www.f.csce.kyushu-u.ac.jp/~shigeaki/software/freebsd-nfe.html
>>
>> Chris
>>
>
> I've also seen wierd problems on a machine that shares an interrupt
> between nfe and ehci. I'm hoping that this recent commit to -CURRENT
> fixes the problem. I'm planning on trying it on my 7.0-BETA machine in
> the next day or so.
>
> scottl 2007-11-21 04:03:51 UTC
>
> FreeBSD src repository
>
> Modified files:
> sys/amd64/amd64 intr_machdep.c
> sys/i386/i386 intr_machdep.c
> sys/ia64/ia64 interrupt.c
> sys/powerpc/powerpc intr_machdep.c
> sys/sparc64/sparc64 intr_machdep.c
> Log:
> Extend critical section coverage in the low-level interrupt handlers to
> include the ithread scheduling step. Without this, a preemption might
> occur in between the interrupt getting masked and the ithread getting
> scheduled. Since the interrupt handler runs in the context of curthread,
> the scheudler might see it as having a such a low priority on a busy system
> that it doesn't get to run for a _long_ time, leaving the interrupt stranded
> in a disabled state. The only way that the preemption can happen is by
> a fast/filter handler triggering a schduling event earlier in the handler,
> so this problem can only happen for cases where an interrupt is being
> shared by both a fast/filter handler and an ithread handler. Unfortunately,
> it seems to be common for this sharing to happen with network and USB
> devices, for example. This fixes many of the mysterious TCP session
> timeouts and NIC watchdogs that were being reported. Many thanks to Sam
> Lefler for getting to the bottom of this problem.
>
>
nfe+ohci was the combo I had that prompted me to fix this.
Sam
More information about the freebsd-stable
mailing list