Re: Panics in recent NFS server

From: Rick Macklem <rmacklem_at_uoguelph.ca>
Date: Mon, 31 May 2021 15:52:38 +0000
Mateusz Guzik wrote:
>I reproduced the panic, things work for me with the patch below.
>However, there may be more to it so I'm going to ask Rick to weigh in.
>but short version is that length returned by nfsrv_parsename is off by
>one compared to copyinstr.
Yes, it appears that, now, ni_pathlen includes the nul termination character.
I don't think that was always the case, but I can't be bothered to search
back through the commits to try and find when it changed.

As such, this patch looks fine and you can consider it reviewed by me.

rick

diff --git a/sys/fs/nfsserver/nfs_nfsdsubs.c b/sys/fs/nfsserver/nfs_nfsdsubs.c
index 2b6e17752544..8c7db36bbd05 100644
--- a/sys/fs/nfsserver/nfs_nfsdsubs.c
+++ b/sys/fs/nfsserver/nfs_nfsdsubs.c
_at__at_ -2065,7 +2065,7 _at__at_ nfsrv_parsename(struct nfsrv_descript *nd, char
*bufp, u_long *hashp,
            }
        }
        *tocp = '\0';
-       *outlenp = (size_t)outlen;
+       *outlenp = (size_t)outlen + 1;
        if (hashp != NULL)
                *hashp = hash;
 nfsmout:


On 5/31/21, Mateusz Guzik <mjguzik_at_gmail.com> wrote:
> On 5/31/21, Mateusz Guzik <mjguzik_at_gmail.com> wrote:
>> It's probably my commit d81aefa8b7dd8cbeffeda541fca9962802404983 ,
>> I'll look at this later.
>
> Well let me rephrase. While the panic was added in said commit, I
> suspect the bug is on nfs side -- it has its own namei variant which I
> suspect is managing ni_pathlen in a manner different than the
> original, it just happens to not panic on kernels prior to the above
> change.
>
>>
>> On 5/31/21, Dimitry Andric <dim_at_freebsd.org> wrote:
>>> Hi,
>>>
>>> I recently upgraded a -CURRENT NFS server from 2021-05-12 to today
>>> (2021-05-31), and when the first NFS client attempted to connect, I got
>>> this
>>> panic:
>>>
>>> panic: lookup: expected nul at 0xfffff800104b3002; string [dim]
>>>
>>> cpuid = 0
>>> time = 1622463863
>>> KDB: stack backtrace:
>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
>>> 0xfffffe00747e89b0
>>> vpanic() at vpanic+0x187/frame 0xfffffe00747e8a10
>>> panic() at panic+0x43/frame 0xfffffe00747e8a70
>>> lookup() at lookup+0xad2/frame 0xfffffe00747e8b10
>>> nfsvno_namei() at nfsvno_namei+0x1a4/frame 0xfffffe00747e8bc0
>>> nfsrvd_lookup() at nfsrvd_lookup+0x191/frame 0xfffffe00747e8eb0
>>> nfsrvd_dorpc() at nfsrvd_dorpc+0xfab/frame 0xfffffe00747e90c0
>>> nfssvc_program() at nfssvc_program+0x604/frame 0xfffffe00747e92a0
>>> svc_run_internal() at svc_run_internal+0xa72/frame 0xfffffe00747e93d0
>>> svc_run() at svc_run+0x250/frame 0xfffffe00747e9430
>>> nfsrvd_nfsd() at nfsrvd_nfsd+0x33c/frame 0xfffffe00747e9590
>>> nfssvc_nfsd() at nfssvc_nfsd+0x473/frame 0xfffffe00747e9aa0
>>> sys_nfssvc() at sys_nfssvc+0xc7/frame 0xfffffe00747e9ac0
>>> amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe00747e9bf0
>>> fast_syscall_common() at fast_syscall_common+0xf8/frame
>>> 0xfffffe00747e9bf0
>>> --- syscall (155, FreeBSD ELF64, sys_nfssvc), rip = 0x8011aa59a, rsp =
>>> 0x7fffffffe4e8, rbp = 0x7fffffffe780 ---
>>> KDB: enter: panic
>>>
>>> __curthread ()
>>>     at /share/dim/src/freebsd/src-dim/sys/amd64/include/pcpu_aux.h:55
>>> 55          __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
>>> (kgdb) #0  __curthread ()
>>>     at /share/dim/src/freebsd/src-dim/sys/amd64/include/pcpu_aux.h:55
>>> #1  doadump (textdump=textdump_at_entry=0)
>>>     at /share/dim/src/freebsd/src-dim/sys/kern/kern_shutdown.c:399
>>> #2  0xffffffff804cca5a in db_dump (dummy=<optimized out>,
>>>     dummy2=<unavailable>, dummy3=<unavailable>, dummy4=<unavailable>)
>>>     at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:575
>>> #3  0xffffffff804cc912 in db_command (last_cmdp=<optimized out>,
>>>     cmd_table=<optimized out>, dopager=dopager_at_entry=1)
>>>     at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:482
>>> #4  0xffffffff804cc58d in db_command_loop ()
>>>     at /share/dim/src/freebsd/src-dim/sys/ddb/db_command.c:535
>>> #5  0xffffffff804cfd06 in db_trap (type=<optimized out>, code=<optimized
>>> out>)
>>>     at /share/dim/src/freebsd/src-dim/sys/ddb/db_main.c:270
>>> #6  0xffffffff80c69f17 in kdb_trap (type=type_at_entry=3,
>>> code=code_at_entry=0,
>>>     tf=tf_at_entry=0xfffffe00747e88e0)
>>>     at /share/dim/src/freebsd/src-dim/sys/kern/subr_kdb.c:727
>>> #7  0xffffffff810d7aee in trap (frame=0xfffffe00747e88e0)
>>>     at /share/dim/src/freebsd/src-dim/sys/amd64/amd64/trap.c:576
>>> #8  <signal handler called>
>>> #9  kdb_enter (why=0xffffffff812d3d27 "panic", msg=<optimized out>)
>>>     at /share/dim/src/freebsd/src-dim/sys/kern/subr_kdb.c:506
>>> #10 0xffffffff80c1d248 in vpanic (
>>>     fmt=0xffffffff8129dfef "%s: expected nul at %p; string [%s]\n",
>>>     ap=<optimized out>, ap_at_entry=0xfffffe00747e8a50)
>>>     at /share/dim/src/freebsd/src-dim/sys/kern/kern_shutdown.c:907
>>> #11 0xffffffff80c1cfd3 in panic (
>>>     fmt=0xffffffff81e9b9c8 <cnputs_mtx> "=\t)\201\377\377\377\377")
>>>     at /share/dim/src/freebsd/src-dim/sys/kern/kern_shutdown.c:843
>>> #12 0xffffffff80cfa992 in lookup (ndp=ndp_at_entry=0xfffffe00747e8d90)
>>>     at /share/dim/src/freebsd/src-dim/sys/kern/vfs_lookup.c:919
>>> #13 0xffffffff80b33f84 in nfsvno_namei (nd=nd_at_entry=0xfffffe00747e9100,
>>>     ndp=ndp_at_entry=0xfffffe00747e8d90, dp=<optimized out>,
>>>     dp_at_entry=0xfffff80010544380, islocked=<optimized out>,
>>> islocked_at_entry=0,
>>>     exp=exp_at_entry=0xfffffe00747e8fd8, p=p_at_entry=0xfffffe00bbfa3000,
>>>     retdirp=0xfffffe00747e8e78)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdport.c:597
>>> #14 0xffffffff80b266a1 in nfsrvd_lookup (nd=0xfffffe00747e9100,
>>>     isdgram=<optimized out>, dp=0xfffff80010544380,
>>> vpp=0xfffffe00747e9010,
>>>     fhp=0xfffffe00747e9074, exp=0xfffffe00747e8fd8)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdserv.c:607
>>> #15 0xffffffff80b1073b in nfsrvd_compound (nd=0xfffffe00747e9100,
>>> isdgram=0,
>>>     tag=0xf <error: Cannot access memory at address 0xf>, taglen=6,
>>>     minorvers=4294967294)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdsocket.c:1098
>>> #16 nfsrvd_dorpc (nd=nd_at_entry=0xfffffe00747e9100,
>>> isdgram=isdgram_at_entry=0,
>>>     tag=0xf <error: Cannot access memory at address 0xf>, taglen=6,
>>>     minorvers=4294967294)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdsocket.c:626
>>> #17 0xffffffff80b24c44 in nfs_proc (nd=0xfffffe00747e9100,
>>>     xid=<optimized out>, xprt=0xfffff80003a14e00, rpp=<optimized out>)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdkrpc.c:402
>>> #18 nfssvc_program (rqst=0xfffff80010455800, xprt=0xfffff80003a14e00)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdkrpc.c:288
>>> #19 0xffffffff80edead2 in svc_executereq (rqstp=0xfffff80010455800)
>>>     at /share/dim/src/freebsd/src-dim/sys/rpc/svc.c:1037
>>> #20 svc_run_internal (grp=<optimized out>, grp_at_entry=0xfffff800100e6100,
>>>     ismaster=ismaster_at_entry=1)
>>>     at /share/dim/src/freebsd/src-dim/sys/rpc/svc.c:1313
>>> #21 0xffffffff80eddf80 in svc_run (pool=<optimized out>)
>>>     at /share/dim/src/freebsd/src-dim/sys/rpc/svc.c:1392
>>> #22 0xffffffff80b251ec in nfsrvd_nfsd (td=<optimized out>,
>>>     td_at_entry=0xfffffe00bbfa3000, args=args_at_entry=0xfffffe00747e9660)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdkrpc.c:561
>>> #23 0xffffffff80b3ec93 in nfssvc_nfsd (td=0xfffffe00bbfa3000,
>>>     uap=<optimized out>)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/fs/nfsserver/nfs_nfsdport.c:3714
>>> #24 0xffffffff80e6f647 in sys_nfssvc (td=0xfffffe00bbfa3000,
>>>     uap=0xfffffe00bbfa33e8)
>>>     at /share/dim/src/freebsd/src-dim/sys/nfs/nfs_nfssvc.c:111
>>> #25 0xffffffff810d891e in syscallenter (td=<optimized out>)
>>>     at
>>> /share/dim/src/freebsd/src-dim/sys/amd64/amd64/../../kern/subr_syscall.c:189
>>> #26 amd64_syscall (td=0xfffffe00bbfa3000, traced=0)
>>>     at /share/dim/src/freebsd/src-dim/sys/amd64/amd64/trap.c:1156
>>> #27 <signal handler called>
>>> #28 0x00000008011aa59a in ?? ()
>>>
>>> Is anybody seeing this too? :)
>>>
>>> I can probably bisect, but it'll take quite a while.
>>>
>>> -Dimitry
>>>
>>>
>>
>>
>> --
>> Mateusz Guzik <mjguzik gmail.com>
>>
>
>
> --
> Mateusz Guzik <mjguzik gmail.com>
>


--
Mateusz Guzik <mjguzik gmail.com>

Received on Mon May 31 2021 - 15:52:38 UTC

Original text of this message