SYNCOOKIE authentication problems
Steve Kargl
sgk at troutmask.apl.washington.edu
Thu Jun 28 16:50:36 UTC 2007
On Thu, Jun 28, 2007 at 08:14:43AM -0700, Julian Elischer wrote:
> Steve Kargl wrote:
> >On Thu, Jun 28, 2007 at 02:50:40PM +0400, Eygene Ryabinkin wrote:
> >>Steve, good day.
> >>
> >>Wed, Jun 27, 2007 at 06:43:11PM -0700, Steve Kargl wrote:
> >>>Any advice on how to isolate or avoid?
> >>>
> >>>Jun 27 18:31:19 node11 kernel: TCP: [192.168.0.11]:59661 to
> >>>[192.168.0.11]:63266 tcpflags 0x10<ACK>; syncache_expand: Segment failed
> >>>SYNCOOKIE authentication, segment rejected (probably spoofed)
> >>According to Andre Oppermann, these are harmless:
> >> http://lists.freebsd.org/pipermail/freebsd-net/2007-June/014401.html
> >>
> >>But I am expiriencing some problems related to the other messages
> >>like 'tcp_input: Listen socket: Spurious RST, segment rejected'.
> >>Though it seems not to be your case, but my problems are documented
> >>in the aforementioned thread. Just in case you're curious...
> >>--
> >
> >Andre certainly knows more about TCP/IP than I, but no, these
> >are not harmless. Everytime one of these messages appears
> >on the console, my MPI application hangs and must be restarted.
> >My large numerical simulations randomly die anywhere from
> >15 minutes to 25 hours after launching the job.
>
> is the app on that machine or another machine?
>
It's a message passing interface MPI application. I have 6 nodes
in a cluster. Each node has 4 CPUs. Each node gets 4 processes.
There are a total of 24 processes, and communication between nodes
is over a GigE network. This is top(1) output on node11
last pid: 2919; load averages: 4.76, 4.56, 4.55 up 0+12:48:23 09:45:04
34 processes: 5 running, 29 sleeping
CPU states: 23.6% user, 0.0% nice, 66.4% system, 10.0% interrupt, 0.0% idle
Mem: 4587M Active, 588M Inact, 263M Wired, 596K Cache, 214M Buf, 10G Free
Swap: 17G Total, 17G Free
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
896 kargl 1 130 0 1567M 1428M CPU3 2 659:31 86.77% AVL_PS_mpi
897 kargl 1 130 0 1201M 1061M CPU2 3 655:01 86.33% AVL_PS_mpi
898 kargl 1 130 0 1201M 1061M RUN 1 653:25 86.18% AVL_PS_mpi
899 kargl 1 139 0 1201M 1061M RUN 2 655:00 85.89% AVL_PS_mpi
When I get the SYNCOOKIE authentication error message, CPU state
shows 0% user and 99.9% system. All 4 processes show WCPU 99.99%.
This occurs on all the nodes. AFAICT, the processes are spinning
waiting for info from other processes. This info never comes
--
Steve
More information about the freebsd-current
mailing list