SYNCOOKIE authentication problems

Steve Kargl sgk at troutmask.apl.washington.edu
Thu Jun 28 16:50:36 UTC 2007


On Thu, Jun 28, 2007 at 08:14:43AM -0700, Julian Elischer wrote:
> Steve Kargl wrote:
> >On Thu, Jun 28, 2007 at 02:50:40PM +0400, Eygene Ryabinkin wrote:
> >>Steve, good day.
> >>
> >>Wed, Jun 27, 2007 at 06:43:11PM -0700, Steve Kargl wrote:
> >>>Any advice on how to isolate or avoid?
> >>>
> >>>Jun 27 18:31:19 node11 kernel: TCP: [192.168.0.11]:59661 to 
> >>>[192.168.0.11]:63266 tcpflags 0x10<ACK>; syncache_expand: Segment failed
> >>>SYNCOOKIE authentication, segment rejected (probably spoofed)
> >>According to Andre Oppermann, these are harmless:
> >>    http://lists.freebsd.org/pipermail/freebsd-net/2007-June/014401.html
> >>
> >>But I am expiriencing some problems related to the other messages
> >>like 'tcp_input: Listen socket: Spurious RST, segment rejected'.
> >>Though it seems not to be your case, but my problems are documented
> >>in the aforementioned thread.  Just in case you're curious...
> >>-- 
> >
> >Andre certainly knows more about TCP/IP than I, but no, these
> >are not harmless.  Everytime one of these messages appears 
> >on the console, my MPI application hangs and must be restarted.
> >My large numerical simulations randomly die anywhere from
> >15 minutes to 25 hours after launching the job.
> 
> is the app on that machine or another machine?
> 

It's a message passing interface MPI application.  I have 6 nodes
in a cluster.  Each node has 4 CPUs.  Each node gets 4 processes.
There are a total of 24 processes, and communication between nodes
is over a GigE network.  This is top(1) output on node11

last pid:  2919;  load averages:  4.76,  4.56,  4.55    up 0+12:48:23  09:45:04
34 processes:  5 running, 29 sleeping
CPU states: 23.6% user,  0.0% nice, 66.4% system, 10.0% interrupt,  0.0% idle
Mem: 4587M Active, 588M Inact, 263M Wired, 596K Cache, 214M Buf, 10G Free
Swap: 17G Total, 17G Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU COMMAND
  896 kargl         1 130    0  1567M  1428M CPU3   2 659:31 86.77% AVL_PS_mpi
  897 kargl         1 130    0  1201M  1061M CPU2   3 655:01 86.33% AVL_PS_mpi
  898 kargl         1 130    0  1201M  1061M RUN    1 653:25 86.18% AVL_PS_mpi
  899 kargl         1 139    0  1201M  1061M RUN    2 655:00 85.89% AVL_PS_mpi

When I get the SYNCOOKIE authentication error message, CPU state
shows 0% user and 99.9% system.  All 4 processes show WCPU 99.99%.
This occurs on all the nodes.  AFAICT, the processes are spinning
waiting for info from other processes.  This info never comes


-- 
Steve


More information about the freebsd-current mailing list