POWER9 NICs failing at 100Gbps
- Go to: [ bottom of page ] [ top of archives ] [ this month ]
Date: Tue, 02 May 2023 17:45:22 UTC
Hello Everyone,
We've been testing FreeBSD 13.2 PowerPC64LE with an LC922 and a Raptor with
100Gbps Chelsio T6 and Mellanox ConnectX-6 NICs, but we get NIC failures
once we saturate either NIC. We can trigger this bug instantly with a few
iperf3 instances running simultaneously.
I've included the log below for the Chelsio NIC and I'm wondering if this
is a known issue?
cc0: link state changed to UP
t6nex0: command 0x16 in mbox 4 timed out (0x4014c010).
t6nex0: mbox 4 cmdsent 16a0094400000001 2328f70000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000
t6nex0: mbox 4 current 16a0094400000001 2328f70000000000 0000000000000000
0000000000000000 0000000000000000 0000000000000000 0000000000000000
0000000000000000
t6nex0: encountered fatal error, adapter stopped (1).
cc0: set_rxmode (1) failed: 60
t6nex0: CIM debug regs1 00000000 00000000 00000000 00000000 00000000
t6nex0: CIM debug regs2 00000000 00000000 00000000 00000000 00330000
t6nex0: CIM LA dump follows.
Status Inst Data PC LS0Stat LS0Addr LS0Data LS1Stat
LS1Addr LS1Data
3c 00003003 1fffeedf 1fffeedf 00a00028 1fff0850 1fff3400 00b00020
1ffce2e8 00000000
3c 00003008 1fffeee2 1fffeee2 00a00028 1fff06a4 1ffce200 00b00020
1ffce2e8 00000000
3c 00003008 1fffeeea 1fffeeea 00a00028 1fff084c 1fff2f0c 00b00020
1ffce2e8 00000000
3c 00003008 1fffeef2 1fffeef2 00a00020 1fff084c 00000000 00b00020
1ffce2e8 00000000
3c 00003002 1fffeefa 1fffeefa 00a00020 1fff084c 00000000 00b00020
1ffce2e8 00000000
3c 00003002 1fffeefc 1fffeefc 00a00020 1fff084c 00000000 00b00020
1ffce2e8 00000000
3c 00003008 1fffeefe 1fffeefe 00a00005 1fff328b 0000000f 00b00025
1ffce2e8 00000000
....
t6nex0: device log follows.
....
46 2968294087 NOTICE PORT port[0:0x11:0x0b]: l1cfg,
1G/10G can't be advertised for this port type. mcaps 0x339f007e acaps
0x20970078 rcaps 0xb3007e
47 2968386457 INFO PORT port_link_state_handler[0]
powering up
48 2968386460 INFO PORT port[0] update (flowcid
40236 rc 0)
49 2968685971 INFO PORT bean_fsm[0] : state START
(count = 1)
50 2968695782 INFO PORT hw_mac_init_port[0], ptype
0x11, speed 0x4, lanes 0xf, fec 0x800000
51 2968696059 INFO PORT bean_fsm[0] : entering
state BASEP_HANDLE
52 2969235973 INFO PORT bean_fsm[0] : entering
state NXP_HANDLE
53 2969245973 INFO PORT bean_fsm[0] : entering
state EXT_NXP_HANDLE
54 2969255973 INFO PORT consortium_fec[0]: local
0x7, remote 0x3, negotiated 0x800000
55 2969255973 INFO PORT bean_fsm[0] : entering
state WAIT_FOR_NULL_PAGE
56 2969285973 INFO PORT bean_fsm[0] : entering
state WAIT_COMPLETE
57 2969285974 INFO PORT bean_fsm[0] : tech ability
local 0x710, remote 0x715 cr-s 0, local fec_ability 0x1
58 2969285974 INFO PORT bean_fsm[0] : IEEE speed
0x40, FEC remote 0x4, negotiated 0x800000
59 2969285975 INFO PORT bean_fsm[0] : state DONE
60 2969285976 INFO PORT bean_fsm[0] : FEC local
0x1, negotiated 0x800000
61 2969286976 INFO PORT hw_mac_init_port[0], ptype
0x11, speed 0x40, lanes 0xf, fec 0x800000
62 2969287972 INFO PORT port[0] negotiated speed
0x40, lanes 0xf:0xf, fec 0x800000
63 2969287974 INFO PORT aec_fsm[0] : state START
(sigdet 0xf)
64 2969288111 INFO PORT aec_fsm[0] : transitioning
to TRAINING
65 2969651045 INFO PORT aec_fsm[0] :
TRAINING_COMPLETE
66 2969651046 INFO PORT aec_fsm[0] : COEFFICIENT
TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75
67 2969651046 INFO PORT aec_fsm[0] : COEFFICIENT
TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75
68 2969651047 INFO PORT aec_fsm[0] : COEFFICIENT
TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75
69 2969651047 INFO PORT aec_fsm[0] : COEFFICIENT
TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75
70 2969651905 INFO PORT aec_fsm[0] : Remote fault
while waiting for link status 0x29
71 2975239314 INFO PORT aec_fsm[0]: aec training
completed, link timed out lstatus 0x5
72 2975239314 INFO PORT aec_fsm[0] Link timed out
after training complete, Link Status 0x5
73 2975335992 INFO PORT bean_fsm[0] : state START
(count = 1)
74 2975345863 INFO PORT hw_mac_init_port[0], ptype
0x11, speed 0x4, lanes 0xf, fec 0x800000
75 2975346140 INFO PORT bean_fsm[0] : entering
state BASEP_HANDLE
76 2975415994 INFO PORT bean_fsm[0] : entering
state NXP_HANDLE
77 2975425994 INFO PORT bean_fsm[0] : entering
state EXT_NXP_HANDLE
78 2975435994 INFO PORT consortium_fec[0]: local
0x7, remote 0x3, negotiated 0x800000
79 2975435994 INFO PORT bean_fsm[0] : entering
state WAIT_FOR_NULL_PAGE
80 2975465994 INFO PORT bean_fsm[0] : entering
state WAIT_COMPLETE
81 2975465995 INFO PORT bean_fsm[0] : tech ability
local 0x710, remote 0x715 cr-s 0, local fec_ability 0x1
82 2975465995 INFO PORT bean_fsm[0] : IEEE speed
0x40, FEC remote 0x4, negotiated 0x800000
83 2975465996 INFO PORT bean_fsm[0] : state DONE
84 2975465996 INFO PORT bean_fsm[0] : FEC local
0x1, negotiated 0x800000
85 2975466997 INFO PORT hw_mac_init_port[0], ptype
0x11, speed 0x40, lanes 0xf, fec 0x800000
86 2975467993 INFO PORT port[0] negotiated speed
0x40, lanes 0xf:0xf, fec 0x800000
87 2975467994 INFO PORT aec_fsm[0] : state START
(sigdet 0xf)
88 2975468131 INFO PORT aec_fsm[0] : transitioning
to TRAINING
89 2975837289 INFO PORT aec_fsm[0] :
TRAINING_COMPLETE
90 2975837289 INFO PORT aec_fsm[0] : COEFFICIENT
TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75
91 2975837290 INFO PORT aec_fsm[0] : COEFFICIENT
TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75
92 2975837290 INFO PORT aec_fsm[0] : COEFFICIENT
TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75
93 2975837291 INFO PORT aec_fsm[0] : COEFFICIENT
TAP OVERRIDE 1:2:3 :: 0x7e:0x1b:0x75
94 2975838184 INFO PORT aec_fsm[0] : Remote fault
while waiting for link status 0x29
95 2981015970 INFO PORT hw_mac_link_status[0]
int_cause 0x17011b4, link_status 0x22
96 2981015970 INFO PORT aec_fsm[0] : Remote fault
cleared while waiting for link status 0x22
97 2981015973 INFO PORT aec_fsm[0] : DONE
98 2981015973 INFO PORT bean/aec complete (retry:
1)
99 2981015974 INFO PORT port_hss_sigdet[0]:
hss_sigdet changed to 0xf
100 2981106013 INFO PORT port[0] link up (1) (speed
0x40 acaps 0x20970078 lpcaps 0x10007e)
101 2981106015 INFO PORT port[0] set PAUSE PARAMS:
pppen 0 txpe 0 rxpe 0
102 2981106018 INFO PORT port[0] update (flowcid
40236 rc 0)
Best,
Ali