Server overloaded? Or is it a bug?
Daniela
dgw at liwest.at
Tue Jun 10 06:25:29 PDT 2003
On Thursday 05 June 2003 20:19, Robert Watson wrote:
> So this tells us that interrupt delivery appears to be working fine for
> your NIC, that the network stack isn't completely hosed, and can allocate
> packet buffers (mbufs), so isn't memory-starved at that level of the
> system.
> Sockets are used only for locally terminated connections, and come out of
> a separate memory pool from packet buffers (well, it's a little more
> complicated than that, but that's enough to get the picture). The reason
> I wondered about this was that one of the classes of possible memory
> starvation is to reach the allocation limit on sockets. We allocate the
> socket (and TCP state) a couple of packets into the TCP setup, so if the
> TCP setup got partway completed and then there was no further response,
> we'd have a possible explanation.
>
> Since the connection completes, it's probably safe to assume the TCP state
> and socket were fully allocated, and the socket was returned by the kernel
> to the application, or at least, the kernel got pretty much to the point
> of returning it to the application.
> Try using "slogin -v" or "ssh -v" on the client, and paste the results
> into an e-mail in response to this one. The SSH daemon does a lot of work
> to set up a new connection -- it forks a process or two, does name
> lookups, allocates pseudo-terminals, invokes PAM, and all kinds of other
> things. There are failure modes for each of these, and a bit more detail
> might let us track it down. Particularly useful might be the results of
> "slogin -v" both when the machine is operating normally, and when it's
> hosed. This will let us figure out about when during the process
> something failed, and what it might have been doing.
>
> > > If you can get partway through the banner but hang later, that
> > > might be the result of a file system deadlock of some sort.
> >
> > This is also possible, but what could have caused it? My file I/O is not
> > really heavy.
>
> Deadlock is a bit of a misnomer for what I have in mind. There are two
> classes of things that look like deadlocks: lock order problems, and lock
> leaks.
...
> So the VFS deadlock is somewhat of a shot in the dark, but it has pretty
> easy to identify symptoms, especially if you can get to a debugger.
> They're also fairly easy to analyze.
...
> I think we'll find that it's either a kernel problem, or an X problem
> triggering a kernel problem, so we're unlikely to find useful core dumps
> from applications. A system core might be useful, but hard to get without
> a serial console.
>
> Ok, so at the end of this all, here were my pieces of advice on debugging
> it, if you can reproduce it:
>
> (1) Compare "slogin -v" to the system in the before and after scenarios,
> that may tell us a lot about what's broken.
>
> (2) Despite the fact that you can't set up a serial console, set up a
> serial console.
...
Some strange things happened these days, they were all related to processes:
(1) I have some zombies I cannot kill:
# ps ax
...
53410 pn Z 0:00.00 (kate)
...
# kill -9 53410
53410: No such process
The same thing happens with make.
(2) When I invoke the KDE System Guard, the process list won't show up.
(3) My processes recieve a lot of signals (10 and 11), about 30 times a day.
(4) Kate crashed when I wanted to save a document, and then every time I
opened it. So I tried gdb kate:
(gdb) run
Starting program: /usr/local/bin/kate
Deprecated bfd_read called at
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line
2627 in elfstab_build_psymtabs
Deprecated bfd_read called at
/usr/src/gnu/usr.bin/binutils/gdb/../../../../contrib/gdb/gdb/dbxread.c line
933 in fill_symbuf
ERROR: Communication problem with kate, it probably crashed.
Program exited with code 0377.
As I never had any problems like these, I guess they are a side effect of the
crash.
Do we have a chance to debug this or should I rebuild my system?
And, most imortant, could this be a new kernel bug? If yes, I would really
like to debug it.
Daniela
More information about the freebsd-stable
mailing list