Bash lockups
Carl Johnson
carlj at peak.org
Fri May 28 18:28:30 UTC 2010
Giorgos Keramidas <keramida at ceid.upatras.gr> writes:
> On Fri, 21 May 2010 09:30:05 -0700, Carl Johnson <carlj at peak.org> wrote:
>> Giorgos Keramidas <keramida at ceid.upatras.gr> writes:
>>> Does this lock-up happen if you leave the shell 'idle' for too long
>>> over an ssh session? There may be problems with stateful connection
>>> tracking between your terminal and the remote shell :-/
>>
>> No, I don't think that could be the problem. I am just using ssh
>> between local machines and there is no firewall between them. It also
>> often seems to happen to a shell as I switch away from it to another
>> one. One suspicion is that something is sending a signal to the shell
>> as it switches, and bash sometimes doesn't handle that signal
>> properly.
>>
>> I also should have mentioned that I have been running bash as my
>> default shell for years under Linux and have never seen this problem
>> there.
>>
>> Thanks for the suggestion.
>
> That's ok. If you can attach to the bash process with ktrace please try
> to grab a ktrace file from a deadlocked shell. We may be able to see
> why it gets deadlocked by running kdump(8) on the shell trace file.
>
> You can run a second shell under ktrace (and hope that the parent
> doesn't deadlock before the traced child shell), by running:
>
> bash$ ktrace -f bash.trace bash --login
>
> When you exit from the child shell you can dump ktrace(8) events from
> the bash.trace file with:
>
> bash$ kdump -f bash.trace > logfile 2>&1
>
> Looking near the last records dumped in 'logfile' should be quite
> informative if the process is dead-locked or spinning around the same
> code over and over again.
I finally got one after starting ktrace a few days ago. It is
informative, but it raises as many questions as it answers. It
basically just wrote out the prompt, *started* to setup for reading
the input and just stopped. I ran gdb on it and it is stuck looping
somewhere in getenv. I don't have the system compiled with debugging,
so I have limited information on what it is doing there. I checked
multiple times, and I also saw getenv running routines such as memset,
strlen, mbrtowc, and wcsnrtombs.
The following is the tail end of the 'kdump -Ef' output:
67263 bash 61412.013860 GIO fd 2 wrote 28 bytes
0x0000 0d0f 1b5b 316d 5b63 6172 6c6a 4063 6a62 7364 3874 207e 5d24 1b5b |...[1m[carlj at cjbsd8t ~]$.[|
0x001a 6d20 |m |
67263 bash 61412.013867 RET write 28/0x1c
67263 bash 61412.013874 CALL sigprocmask(SIG_SETMASK,0x80e133c,0)
67263 bash 61412.013880 RET sigprocmask 0
and the following is the similar section of a normal prompt:
67263 bash 61403.461469 GIO fd 2 wrote 27 bytes
0x0000 0f1b 5b31 6d5b 6361 726c 6a40 636a 6273 6438 7420 7e5d 241b 5b6d |..[1m[carlj at cjbsd8t ~]$.[m|
0x001a 20 | |
67263 bash 61403.461476 RET write 27/0x1b
67263 bash 61403.461483 CALL sigprocmask(SIG_SETMASK,0x80e133c,0)
67263 bash 61403.461489 RET sigprocmask 0
67263 bash 61403.461497 CALL sigprocmask(SIG_BLOCK,0,0x80e1e3c)
67263 bash 61403.461504 RET sigprocmask 0
67263 bash 61403.461513 CALL read(0,0xbfbfd95f,0x1)
I just realized there is an extra CR at the beginning of that prompt
(28 bytes instead of 27) that I don't see elsewhere, but nothing else
before that looks different. This one is an i368 8.0 release, but I
also have another hung shell in a amd64 7.3 release system in
VirtualBox. I just checked my other ktrace logs and I found one
other place where that extra CR occurs, but there is no lockup there
and that was my other system.
The following is a section of a backtrace from gdb:
#0 0x28308540 in mbrtowc () from /lib/libc.so.7
#1 0x080c7ce6 in getenv ()
#2 0x080c1335 in getenv ()
#3 0x080ae1d4 in getenv ()
#4 0x080ac4b0 in getenv ()
#5 0x080ac815 in getenv ()
#6 0x080c3955 in getenv ()
#7 0x080c3ac9 in getenv ()
#8 0x080ac4b0 in getenv ()
#9 0x080ac815 in getenv ()
#10 0x080acb6c in getenv ()
#11 0x080acf55 in getenv ()
#12 0x08054611 in ?? ()
#13 0x284a9a80 in ?? ()
...
#67 0x2832cbfd in time () from /lib/libc.so.7
The first few entries change when I let it run for a while, but the
last 8-9 getenv addresses and everything before them remain the same.
There are a total of about 65 backtrace entries this time, some of
which are 0x00000000 addresses which seem suspicious. The backtrace
from the other hung shell is also in getenv, but I didn't have ktrace
running on that one.
I am at the limit of my experience, so does anybody else have any
ideas about what could cause this, or how I could trace it further? I
am keeping the processes attached to gdb, so I can do further checking
on them if anyone has any other ideas. Thanks in advance for any
help, and thanks for the help that allowed me to get this far.
--
Carl Johnson carlj at peak.org
More information about the freebsd-questions
mailing list