problem: TCPIP process loops and priority reduced to 1.

Hemanth hemanth.thummala at wipro.com
Fri Nov 24 00:01:07 PST 2006


Hi All,
 
As mentioned in the subject the TCPIP is looping and its priority is
reducing to 1 by the kernal (BSD kind)
 
>From the dump, we were able to identify the inpcbs that are causing the
problem.

 

inpcb:0x222f1a0c, lport:10143, laddr:0x7f000001 (127.0.0.1)

                  fport: 1378  faddr:0x7f000001 (127.0.0.1)

tcpcb:0x2270760c  inpcb:0x222f1a0c          flags:0x01e0  

 

and

 

inpcb:0x224a850c, lport: 1378, laddr:0x7f000001 (127.0.0.1)

                  fport:10143  faddr:0x7f000001 (127.0.0.1)

tcpcb:0x2268d10c  inpcb:0x224a850c          flags:0x01e0  flags:0x0000

 

If we see the tcpcb structure for both structures

 

$5 = {

  seg_next_md = 0xfffc0000,

  t_state = 0x5,====>>> CLOSE_WAIT state 

    t_rxtshift = 0x0,

  t_rxtcur = 0x3fc,

  t_dupacks = 0x0,

  t_maxseg = 0x7d00,

  t_force = 0x0,

  t_flags = 0x1e0,

  t_template = 0x22776bd4,

  t_inpcb = 0x224a850c,

  snd_una = 0xbfaeb6d9,

  snd_nxt = 0xbfaeb6d9,

  snd_up = 0xbfaeb6d9,

  snd_wl1 = 0xbfb19e13,

  snd_wl2 = 0xbfaeb6d9,

  iss = 0xbfaeb678,

  snd_wnd = 0x7d00,

  rcv_wnd = 0xf96,

  rcv_nxt = 0xbfb19e15,

  rcv_up = 0xbfb19e13,

  irs = 0xbfb0a0a2,

  rcv_adv = 0xbfb19daa,

  snd_max = 0xbfaeb6d9,

  inv_SYN_seqbase = 0xffaeb678,

  snd_cwnd = 0x17f10,

  snd_ssthresh = 0x3fffc000,

  t_idle = 0x0003934ba0b407c6,

  t_rtt = 0x0,

  t_rtseq = 0xbfaeb6c8,

  t_srtt = 0xba2,

  t_rttvar = 0x288,

  max_rcvd = 0x7d6a,

  max_sndwnd = 0x7d00,

  t_oobflags = 0x0,

  t_iobc = 0x0,

  tm_running = 0x0,  No timer running

  tm_q = 0x8003378,

  tm_wait = 0x3fc,

  tm_exp_time = 0x0003934b7cb490dc,

  saved_rtt_clock = 0x0003934b7ca5007c,

  conn_exp_time = 0x0003934b811903db,

  tm_prev = 0x0,

  tm_next = 0x0,

  on_delay_ack_q = 0x0,

  dack_next = 0x0,

  dack_prev = 0x0,

  snd_scale = 0x0,

  rcv_scale = 0x0,

  request_r_scale = 0x0,

  requested_s_scale = 0x0,

  ts_recent = 0x6e49637,

  ts_recent_age = 0x6e49638,

  last_ack_sent = 0xbfb19e15

}

 

And 

 

$8 = {

  seg_next_md = 0xfffc0000,

  t_state = 0x6, ==========>>> FIN_WAIT_1

  t_rxtshift = 0x2,

  t_rxtcur = 0x3eb,

  t_dupacks = 0x0,

  t_maxseg = 0x7d00,

  t_force = 0x0,

  t_flags = 0x1e0,

  t_template = 0x22a1e1d4,

  t_inpcb = 0x222f1a0c,

  snd_una = 0xbfb19e13,

  snd_nxt = 0xbfb19e13,

  snd_up = 0xbfb19e13,

  snd_wl1 = 0xbfaeb6d9,

  snd_wl2 = 0xbfb19e13,

  iss = 0xbfb0a0a2,

  snd_wnd = 0x0,

  rcv_wnd = 0x7d00,

  rcv_nxt = 0xbfaeb6d9,

  rcv_up = 0xbfaeb6d9,

  irs = 0xbfaeb678,

  rcv_adv = 0xbfaf33d9,

  snd_max = 0xbfb19e14,

  inv_SYN_seqbase = 0xffaeb678,

  snd_cwnd = 0xffff,

  snd_ssthresh = 0x3fffc000,

  t_idle = 0x0003934ba0b407c6,

  t_rtt = 0x0,

  t_rtseq = 0xbfb17daa,

  t_srtt = 0x1f40,

  t_rttvar = 0x3,

  max_rcvd = 0x0,

  max_sndwnd = 0x7d00,

  t_oobflags = 0x0,

  t_iobc = 0x0,

  tm_running = 0x2,==== >> PERSIST TIMER is running on this end.

  tm_q = 0x8004460,

  tm_wait = 0x1388,

  tm_exp_time = 0x0003934ba1005255,

  saved_rtt_clock = 0x0003934b7df4342c,

  conn_exp_time = 0x0003934b811903de,

  tm_prev = 0x0,

  tm_next = 0x0,

  on_delay_ack_q = 0x0,

  dack_next = 0x0,

  dack_prev = 0x0,

  snd_scale = 0x0,

  rcv_scale = 0x0,

  request_r_scale = 0x0,

  requested_s_scale = 0x0,

  ts_recent = 0x6e4917e,

  ts_recent_age = 0x6e4917e,

  last_ack_sent = 0xbfaeb6d9

}

 

 

If we closely look into this structure, the snd_max of one end (which is in
FIN_WAIT_1 state) is less than the rcv_nxt of another end (which is in
CLOSE_WAIT). When the end (present in CLOSE_WAIT) sends any frame it will
put rcv_nxt field in the ti_ack field. 

 

Because of the below code, a check SEQ_GT(ti->ti_ack, tp->snd_max) at the
end which is in FIN_WAIT_1 will fail resulting in it to drop the packed and
send an ack with ack sequence number it is expecting.

 

Snippet from tcp_input()

 

   1523                 tp->t_dupacks = 0;

   1524                 if (SEQ_GT(ti->ti_ack, tp->snd_max)) {

   1525                         tcpstat.tcps_rcvacktoomuch++;

   1526                         goto dropafterack;

   1527                 }

 

 

If we dump the tcpstat information from the online cpu dump:

 

$2 = {

  ...

  . . .

  . . .

  

  tcps_connattempt = 382657891, 

  tcps_rcvtotal = 272941912, 

  tcps_sndtotal = 272848321, 

  tcps_noport = 13535, 

  tcps_rcvurp = 0, 

  . . . 

  tcps_sndacks = -394474409,

  tcps_rcvacktoomuch = 34255849, 

  tcps_rcvackpack = 425941141, 

  tcps_rcvackbyte = 603776015, 

  tcps_rcvwinupd = 651289, 

  tcps_predack = 2280435060, 

  ..

  tcps_sc_aborted = 0, 

  tcps_sc_dupesyn = 0, 

  tcps_sc_dropped = 0

 

and the tcpstat captured in the saveabend

 

tcpstat =

  .. .

  . . .

 

  tcps_connattempt = 382657891

  tcps_rcvtotal = 421811861

  tcps_sndtotal = 421718270

  tcps_noport = 13535

  tcps_rcvurp = 0

  ...

  ...

  tcps_sndacks = -245604460

  tcps_rcvacktoomuch = 108690823

  tcps_rcvackpack = 425941141

  tcps_rcvackbyte = 603776015

  tcps_rcvwinupd = 651289

  tcps_predack = 2280435060

  ... 

  tcps_sc_aborted = 0

  tcps_sc_dupesyn = 0

  tcps_sc_dropped = 0

 

We see lot of difference in three values, tcps_rcvtotal,tcps_sndtotal and
tcps_rcvacktoomuch and rest of the values remains the same. These values
have increased ie almost doubled between time when cpu dump was taken (
Priority was 9) and the save abend (priority is 1). We presume that this
observation is important clue for the problem and will work further by
looking at code in what scenarios these values will increase. 

This will cause another end to send ack again with incorrect sequence
number. These events ie of repeated acks continue which led the tcpip proc
to loop and hence priority was reduced to 1. This explains the reason behind
why only tcps_rcvtotal, tcps_sndtotal, tcps_rcvduppack and
tcps_rcvacktoomuch are getting incremented in tcpstats.

 

We simulated the problem by making one end to send ACK frame with ack seq
number higher than the expected. This resulted in loop and priority was
reduced to one.

 

We are now looking at the scenarios when an ACK packet can come with ack
sequence number higher than expected.

 

 

We are wondering in which scenarios this type of problem can occur in BSD. 

 

I am looking for urgent feedbacks.

 

Regards,

Hemanth.



More information about the freebsd-net mailing list