Re: lsof crashes in Arm Optimized Routines

From: John F Carr <jfc_at_mit.edu>
Date: Sun, 01 Jan 2023 17:30:25 UTC
> On Jan 1, 2023, at 07:49, Ronald Klop <ronald-lists@klop.ws> wrote:
> 
> On 11/18/22 01:57, Mark Millard wrote:
>>> On Nov 15, 2022, at 03:33, Ronald Klop <ronald-lists@klop.ws> wrote:
>>> 
>>> Sorry for the noise.
>>> 
>>> But I cannot reproduce this today. I can scroll back in my terminal and see the command and error from yesterday, but running the same again just works.
>> FYI:
>> I do not have specifics any more, but I'll note that I've seen
>> such lsof behavior of failing at one time and later working
>> without any installed updates to it or the system between. I
>> rarely use lsof and, so, this was not recently.
>> I've no clue how to cause the failure(s) to show up. I've no
>> clue how common the issue is. But, over time, it is not just
>> you.
>>> 
>>> Van: Ronald Klop <ronald-lists@klop.ws>
>>> Datum: maandag, 14 november 2022 21:53
>>> Aan: freebsd-arm@FreeBSD.org, Andrew Turner <andrew@FreeBSD.org>
>>> Onderwerp: lsof crashes in Arm Optimized Routines
>>> Hi,
>>> 
>>> See https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267760 : Segmentation fault in lsof. Program received signal SIGSEGV, Segmentation fault.
>>> Invalid permissions for mapped object.
>>> memcpy () at /home/ronald/dev/freebsd/src/contrib/arm-optimized-routines/string/aarch64/memcpy.S:175
>>> 175 stp D_l, D_h, [dst, 64]!
>>> 
>>> I also remembered this change: https://cgit.freebsd.org/src/log/contrib/arm-optimized-routines?showmsg=1 about Arm Optimized Routines.
>>> 
>>> Could this be related? What can I do to help debug this?
>>> 
>>   ===
>> Mark Millard
>> marklmi at yahoo.com
> 
> 
> I'm having this issue again.
> 
> No debugging symbols found in lsof)
> (gdb) run
> Starting program: /usr/local/sbin/lsof
> 
> Program received signal SIGSEGV, Segmentation fault.
> Invalid permissions for mapped object.
> memcpy () at /home/ronald/dev/freebsd/src/contrib/arm-optimized-routines/string/aarch64/memcpy.S:171
> bt
> 171             stp     B_l, B_h, [dst, 32]
> (gdb) bt
> #0  memcpy () at /home/ronald/dev/freebsd/src/contrib/arm-optimized-routines/string/aarch64/memcpy.S:171
> #1  0x0000000000218be4 in ?? ()
> #2  0x0000000400000000 in ?? ()
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
> (gdb)
> 
> 
> Some output of "truss -o /tmp/lsof.txt lsof":
> 
> __sysctl("kern.proc.filedesc.1",4,0x0,0x80ba06f0,0x0,0) = 0 (0x0)
> __sysctl("kern.proc.filedesc.1",4,0x851d6000,0x80ba06f0,0x0,0) = 0 (0x0)
> __sysctl("kern.proc.filedesc.385",4,0x0,0x80ba06f0,0x0,0) = 0 (0x0)
> __sysctl("kern.proc.filedesc.385",4,0x8516ec00,0x80ba06f0,0x0,0) = 0 (0x0)
> __sysctl("kern.proc.filedesc.97537",4,0x0,0x80ba06f0,0x0,0) = 0 (0x0)
> __sysctl("kern.proc.filedesc.97537",4,0x8516ec00,0x80ba06f0,0x0,0) = 0 (0x0)
> statfs("/data/jails/jail13/_root/home/root/dev/workspace/FreeBSD-Ports-13/_root/usr/local/poudriere/data/.m/freebsd13-custom/04/bin/sh",{ fstypename=nullfs,mntonname=/data/jails/jail13/_root/home,mntfromname=/data/jails/_home,fsid=3cff022929000000 }) = 0 (0x0)
> statfs("/data/jails/jail13/_root",{ fstypename=nullfs,mntonname=/data/jails/jail13/_root,mntfromname=/data/jails/freebsd13,fsid=37ff022929000000 }) = 0 (0x0)
> statfs("/data/jails/_home3root/dev/workspace/FreeBSD-Ports-13/_root/usr/local/poudriere/data/.m/freebsd13-custom/04/bin/sh",0x80b9ef40) ERR#2 'No such file or directory'
> statfs("/data/jails/_home3root/dev/workspace/FreeBSD-Ports-13/_root/usr/local/poudriere/data/.m/freebsd13-custom/04/wrkdirs/usr/ports/devel/cmake-core/work/cmake-3.24.3/Source",0x80b9ef40) ERR#2 'No such file or directory'
> statfs("/data/jails/_home3root/dev/workspace/FreeBSD-Ports-13/_root/usr/local/poudriere/data/.m/freebsd13-custom/04",0x80b9ef40) ERR#2 'No such file or directory'
> statfs("/data/jails/freebsd13ovt",0x80b9ef40)    ERR#2 'No such file or directory'
> SIGNAL 11 (SIGSEGV) code=SEGV_MAPERR trapno=36 addr=0x80ba1000
> process killed, signal = 11 (core dumped)
> 
> 
> I'm surprised that the path names in the truss output are corrupted: _home3root should be _home/root.
> 
> NB: I'm using lsof while running poudriere in a jail in a Jenkins agent.
> 
> Regards,
> Ronald.
> 
> 

I think this is a bug in lsof and the optimized memcpy routine is doing what it is asked to do, copy into a block of memory that the caller does not have write access to.  The faulting data address 0x80ba1000 is at the start of a page.  The faulting instruction address is in the middle of a block of code that writes to successively increasing addresses.  The destination pointer passed to memcpy must be valid or the function would have crashed earlier.  But the end address is out of bounds, meaning the size is wrong.  If you can get the program in a debugger again, or you can find a core file, check the value of register x2 ("count" in assembly code).  If that is huge then you have an uninitialized or otherwise invalid third argument to memcpy.

In a jail system calls to determine the current filesystem behave differently.  The odd path names may be symptoms of jail-induced confusion.