Process in T state does not want to die.....

Willem Jan Withagen wjw at digiware.nl
Wed Nov 27 15:11:47 UTC 2019


Hi,

Probably a "dumb" question, but still I wondering what is going on...

I have this ceph server running several OSDs (ceph-osd), now when they
do not get certain responses within a time limit, they commit suicide.

That is a rather convoluted process where they
  - call abort()
  - which is then trapped the ABORT signal handler
    Try to dump the logging state
    Try to dump stacktrace
  - either call _exit()
    or call reraise_fatal
  - reraise_fatal does some logging
    and calls exit(1)

And then the process ends up as:
root 3433 0.0  4.2  699944 353716 - TsJ 11Nov19   38:10.17 ceph-osd -i 2

Where the I state make it Terminated and no more processing is consumed.
But the process one way or another is not going away and keeps resources 
locked that prevents starting a new daemon.

It stays in that state for a
  1) few minutes, and then it is gone from the processtable.
  2) forever (>24h)

But why doesn't the process die (right away)?
Killing it -9 does not help.
Trying to attach gdb brings nothing.

If it disappears from the processtable, somethings there is a core.

Do how do I debug this?

--WjW




More information about the freebsd-hackers mailing list