sshd crash
Diane Bruce
db at db.net
Sat Nov 2 15:40:27 UTC 2013
On Sat, Nov 02, 2013 at 07:33:40AM -0600, Ian Lepore wrote:
> On Fri, 2013-11-01 at 22:35 -0700, Tim Kientzle wrote:
> > On Nov 1, 2013, at 3:02 PM, Ian Lepore <ian at freebsd.org> wrote:
> >
> > > On Sat, 2013-11-02 at 02:40 +0800, Jia-Shiun Li wrote:
> > >> On Sat, Nov 2, 2013 at 1:53 AM, Ian Lepore <ian at freebsd.org> wrote:
> > >>> On Sat, 2013-11-02 at 01:44 +0800, Jia-Shiun Li wrote:
> > >>>> may I add: putty causes this to happen. mine 0.62. But ssh from another
> > >>>> FreeBSD host has no problem.
> > >>>>
> > >>>> I suspect it to be some issues related to memory or malloc issues
> > >>>> specific to bbb. 'tmux a -d' without existing detached sessions
> > >>>> causes tmux client to core dump. But sshd and it are both fine on rpi.
> > >>>>
> > >>>> -Jia-Shiun.
> > >>>
> > >>> This is the first I've heard of being able to ssh to an arm platform
> > >>> that doesn't have PrivSep disabled, since about July or so. I've never
> > >>> heard a report yet that anything on the client side could make a
> > >>> difference.
> > >>>
> > >>> It's definitely not a beaglebone thing, it happens on every arm board
> > >>> I've got... dreamplug, rpi, bbw, imx53, wandboard.
> > >>
> > >>
> > >> Ok let me make sure I did not mix things up. ;)
> > >>
> > >> IIRC I once saw similar issue on rpi shortly. But after another
> > >> weekly update it was gone. I did not pay too much attention on rpi,
> > >> and thought it was bbb specific.
> > >>
> > >> I did not change sshd_config, UsePrivilegeSeparation supposed
> > >> remaining on as default is.
> >
> > I started looking into it a couple of months ago but didn't get
> > very far; Diane Bruce got a lot further than I did.
> >
> > If I recall correctly, it started up when the malloc libc symbols
> > were changed. That may have altered what malloc implementation
> > sshd used.
> >
> > So it could be a long-standing stray write that jemalloc just
> > happens to detect.
> >
> > It could also be related to locking (there's some multi-threaded
> > crypto code in sshd that may be involved).
>
> There's lots of stuff with lock in the name, but I don't think there are
> actually any threads involved in sshd, just forking. ldd says sshd
> doesn't link to libthr.
>
> I'm not sure it's a mundane stray-write either. The routine that's
> asserting is checking to see if the contents of a page are all-zero
> because a jemalloc internal flag is set that says it should be. I had
> the routine print the non-zero data it found, and it looks like this:
>
> not-zero at 0 0x20c99000 = 0x20800a00
> not-zero at 1 0x20c99004 = 0x00000001
> not-zero at 2 0x20c99008 = 0x0000002f
> not-zero at 3 0x20c9900c = 0xffffffff
> not-zero at 4 0x20c99010 = 0x00007fff
> not-zero at 5 0x20c99014 = 0x00000003
> not-zero at 96 0x20c99180 = 0x5a5a5a5a
> not-zero at 97 0x20c99184 = 0x5a5a5a5a
> not-zero at 98 0x20c99188 = 0x5a5a5a5a
>
> The 0x5a continues to the end of the page. So jemalloc has metadata
> that says it thinks the page is all-zeroes, and the page is a mix of
> data and some zeroes and the 5a junk-fill byte. It seems more like the
> metadata is in error somehow. (Maybe a stray write hit the metadata.)
>
> -- Ian
>
I did a ln -s "quarantine:16000000" /etc/malloc.conf
which also works. This led me down the garden path of thinking
it might be a use after free. This was the conclusion jasone also
came to. Which led to me reporting this possibility to secteam and des.
http://docs.freebsd.org/cgi/getmsg.cgi?fetch=199241+0+archive/2013/freebsd-arm/20130728.freebsd-arm
Nevertheless, running efence from ports failed to come up with
any use after free.
I put together some notes for des at
http://www.freebsd.org/~db/fordes
The rev is question
http://svnweb.freebsd.org/base?view=revision&revision=250991
>
When jemalloc was turned on for userland. There existed an older malloc
(also by jasone)
/usr/src/lib/libc/stdlib/malloc.c
I agree with Ian, it is not thread locking. I have a thread test
program which does not show any faults in our thread locking.
Yes we it is purely associated with the fork.
zbb@ also reported a similar problem with another platform.
===
Hello.
I'm sending you the logs. Please see below.
Best regards
Zbyszek Bodek
1.
=======
--- ExprConstant.o ---
<jemalloc>:
/home/zbb/projects/armsp/freebsd-arm-superpages/lib/libc/../../contrib/jemalloc/include/jemalloc/internal/arena.h:757:
Failed assertion: "binind < NBINS"
./StmtNodes.inc.h: In member function 'RetTy clang::StmtVisitorBase<Ptr,
ImplClass, RetTy>::Visit(typename Ptr<clang::Stmt>::type) [with Ptr =
clang::make_const_ptr, ImplClass = <unnamed>::LValueExprEvaluator, RetTy =
bool]':
./StmtNodes.inc.h:873: internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
*** [ExprConstant.o] Error code 1
make[6]: stopped in /usr/src/lib/clang/libclangast
make[6]: stopped in /usr/src/lib/clang/libclangast
*** [all] Error code 2
make[5]: stopped in /usr/src/lib/clang
1 error
make[5]: stopped in /usr/src/lib/clang
*** [all] Error code 2
make[4]: stopped in /usr/src/lib
1 error
make[4]: stopped in /usr/src/lib
A failure has been detected in another branch of the parallel make
make[3]: stopped in /usr/src
*** [libraries] Error code 2
make[2]: stopped in /usr/src
1 error
make[2]: stopped in /usr/src
*** [_libraries] Error code 2
make[1]: stopped in /usr/src
1 error
make[1]: stopped in /usr/src
*** [buildworld] Error code 2
make: stopped in /usr/src
1 error
2.
=======
--- ExprConstant.o ---
<jemalloc>:
/home/zbb/projects/armsp/freebsd-arm-superpages/lib/libc/../../contrib/jemalloc/include/jemalloc/internal/arena.h:757:
Failed assertion: "binind < NBINS"
/usr/src/lib/clang/libclangast/../../../contrib/llvm/tools/clang/lib/AST/ExprConstant.cpp:
In member function 'RetTy<unnamed>::ExprEvaluatorBase<Derived,
RetTy>::VisitCallExpr(const clang::CallExpr*) [with Derived =
<unnamed>::IntExprEvaluator, RetTy = bool]':
/usr/src/lib/clang/libclangast/../../../contrib/llvm/tools/clang/lib/AST/ExprConstant.cpp:3190:
internal compiler error: Abort trap
Please submit a full bug report,
with preprocessed source if appropriate.
See <URL:http://gcc.gnu.org/bugs.html> for instructions.
*** [ExprConstant.o] Error code 1
----- End forwarded message -----
There is also an open bug report for that one.
>From both zbb and Matthias Meyser see PR 182060
It's time to bring in jasone again I think and I have included him
on the cc. jemalloc has a number
of fill places using the same pattern. I modified the pattern
to be different in order to track what we are seeing. Where I have
left it now is I think it might be associated with the thread cache
code, because the pattern I see comes from that branch of his code.
I have copious notes here but will have to dig them up.
Both Ian and I were rather hoping zbb@ had fixed this one when
he fixed a stupid in the arm vm, Ian tells me it is still there.
- Diane
--
- db at FreeBSD.org db at db.net http://www.db.net/~db
More information about the freebsd-arm
mailing list