[Bug 275597] Samba: smbd sometimes aborts by PANIC when 'vfs objects = cap'

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 18 Dec 2023 01:45:12 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=275597

--- Comment #2 from uratan <uratan@miomio.jp> ---
Created attachment 247109
  --> https://bugs.freebsd.org/bugzilla/attachment.cgi?id=247109&action=edit
a script to make a set of test files that is used by the debugging

I noticed that I could do printf-debugging by setting the
debug level in /usr/local/etc/smb4.conf, ...

  (modification to) /usr/local/etc/smb4.conf
  +-----------------------------------------
  |;   max log size = 50
  |   max log size = 9999
  |   log level = 6
  +-----------------------------------------

... so I decided to start tracking down the cause by myself.

I made samba-4.16.11 (same version as packages) from ports
and gathered information with the DEBUG() macro.
which are output to /var/log/samba4/log.<client-hostname>.

 - * - * -

I confirmed at first the place where the 'Segmentation fault'
were caused at, it was in function cap_readdir() shown below.

  (/usr/ports/net/samba416/work/samba-4.16.11/source3/modules/vfs_cap.c)
      +------------------------------------------------------------------
   87 |static struct dirent *cap_readdir(vfs_handle_struct *handle,
   88 |                                  struct files_struct *dirfsp,
   89 |                                  DIR *dirp,
   90 |                                  SMB_STRUCT_STAT *sbuf)
   91 |{
   92 |        struct dirent *result;
   93 |        struct dirent *newdirent;
   94 |        char *newname;
   95 |        size_t newnamelen;
   96 |        DEBUG(3,("cap: cap_readdir\n"));
   97 |
   98 |        result = SMB_VFS_NEXT_READDIR(handle, dirfsp, dirp, NULL);
   99 |        if (!result) {
  100 |                return NULL;
  101 |        }
  102 |
  103 |        newname = capdecode(talloc_tos(), result->d_name);
  104 |        if (!newname) {
  105 |                return NULL;
  106 |        }
  107 |        DEBUG(3,("cap: cap_readdir: %s\n", newname));
  108 |        newnamelen = strlen(newname)+1;
  109 |        newdirent = talloc_size(
  110 |                talloc_tos(), sizeof(struct dirent) + newnamelen);
  111 |        if (!newdirent) {
  112 |                return NULL;
  113 |        }
  114 |        talloc_set_name_const(newdirent, "struct dirent");
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
**116 |        memcpy(&newdirent->d_name, newname, newnamelen);
  117 |        return newdirent;
  118 |}
      +------------------------------------------------------------------

In the function cap_readdir(), there are 2 memcpy()s used
just before return statement.  I confirmed that the
'Segmentation fault' were caused only at first memcpy(),
the second memcpy() has no problem, with DEBUG() like below.
      +------------------------------------------------------------------
   87 |static struct dirent *cap_readdir(vfs_handle_struct *handle,
      |
 -- omitted --
      |
  114 |        talloc_set_name_const(newdirent, "struct dirent");
      |DEBUG(3,("@@u@@ 456\n"));
      |memcpy(&newdirent->d_name, newname, newnamelen);       //// dup of #116
      |DEBUG(3,("@@u@@ 888\n"));
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
      |DEBUG(3,("@@u@@ 789\n"));
**116 |        memcpy(&newdirent->d_name, newname, newnamelen);
      |DEBUG(3,("@@u@@ 999\n"));
  117 |        return newdirent;
  118 |}
      +------------------------------------------------------------------
       |    |
       V    V
  (excerpt of /var/log/samba4/log.argon near PANIC abort with above codes)
  +------------------------------------------------------------------
  |    :
  |  cap: cap_readdir
  |  cap: cap_readdir: igfxCPL.cpl
  |  @@u@@ 456
  |  @@u@@ 888
  |  @@u@@ 789
  |  @@u@@ 999
  |  smbd_dirptr_get_entry: dirptr 0x2af766c0 now at offset 7585
  |  fsp_new: allocated files structure (508 used)
  |  fget_ea_dos_attribute: Cannot get attribute from EA on file
win/tst/teHW2/N3x50B-ITX/v15.38.2.64.4189/Graphics/igfxCPL.cpl: Error =
Attribute not found
  |  dos_mode_debug_print: fdos_mode returning (0x80): ""
  |  smbd_dirptr_get_entry mask=[*] found
win/tst/teHW2/N3x50B-ITX/v15.38.2.64.4189/Graphics/igfxCPL.cpl
fname=igfxCPL.cpl (igfxCPL.cpl)
  |  file_free: freed files structure 0 (507 used)
  |  cap: cap_readdir
  |  cap: cap_readdir: igfxCUIService.exe
  |  @@u@@ 456
  |  @@u@@ 888
  |  @@u@@ 789
  |  @@u@@ 999
  |  smbd_dirptr_get_entry: dirptr 0x2af766c0 now at offset 7681
  |  fsp_new: allocated files structure (508 used)
  |  fget_ea_dos_attribute: Cannot get attribute from EA on file
win/tst/teHW2/N3x50B-ITX/v15.38.2.64.4189/Graphics/igfxCUIService.exe: Error =
Attribute not found
  |  dos_mode_debug_print: fdos_mode returning (0x80): ""
  |  smbd_dirptr_get_entry mask=[*] found
win/tst/teHW2/N3x50B-ITX/v15.38.2.64.4189/Graphics/igfxCUIService.exe
fname=igfxCUIService.exe (igfxCUIService.exe)
  |  file_free: freed files structure 0 (507 used)
  |  cap: cap_readdir
  |  cap: cap_readdir: igfxCUIServicePS.dll
  |  @@u@@ 456
  |  @@u@@ 888
  |  ===============================================================
  |  INTERNAL ERROR: Signal 11: Segmentation fault in pid 16090 (4.16.11)
  |  If you are running a recent Samba version, and if you think ....
  |  ===============================================================
  |  PANIC (pid 16090): Signal 11: Segmentation fault in 4.16.11
  |  BACKTRACE:
  |   #0 log_stack_trace + 0x43 [ip=0x203e8643] [sp=0xffbfd35c]
  |    :
  +------------------------------------------------------------------

 - * - * -

I tried various things hoping something would change
the situation for about a week (only within the cap_readdir()).
Finally, after over 20 builds & tests, I reached some solution that,
writing down the first memcpy() by replacing with simple
assignment statements (see below) made the problem disappeared.
      +------------------------------------------------------------------
   87 |static struct dirent *cap_readdir(vfs_handle_struct *handle,
      |
 -- omitted --
      |
  114 |        talloc_set_name_const(newdirent, "struct dirent");
      |#define Q_HACK_U  1
      |#if Q_HACK_U
      |        newdirent->d_pad1   = result->d_pad1   ;
      |        newdirent->d_namlen = result->d_namlen ;
      |        newdirent->d_pad0   = result->d_pad0   ;
      |        newdirent->d_type   = result->d_type   ;
      |        newdirent->d_reclen = result->d_reclen ;
      |        newdirent->d_off    = result->d_off    ;
      |        newdirent->d_fileno = result->d_fileno ;
      |#else /* Q_HACK_U */
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
      |#endif /* Q_HACK_U */
**116 |        memcpy(&newdirent->d_name, newname, newnamelen);
  117 |        return newdirent;
  118 |}
      +------------------------------------------------------------------
  note:
    I considered that the bottom side of the memory area is more safe
    because the second memcpy() did not cause any problems,
    so it is the reason why the structure member assignments are
    in reverse order (to insert further DEBUG()s if needed).
    Also member d_name[] is not copied manually because
    it is done by second memcpy().

 - * - * -

Since there was nothing unnatural about the pointer values, etc.
by the result of another research with DEBUG() like below, ...
      +------------------------------------------------------------------
   87 |static struct dirent *cap_readdir(vfs_handle_struct *handle,
      |
 -- omitted --
      |
      |DEBUG(3,("@u@ %lx %lx %x\n", newdirent, result, sizeof(struct dirent)));
**115 |        memcpy(newdirent, result, sizeof(struct dirent));
      +------------------------------------------------------------------
       |    |
       V    V
  (excerpt of /var/log/samba4/log.argon near PANIC abort with above codes)
  +------------------------------------------------------------------
  |    :
  |  cap: cap_readdir: HDXRtGi.inf
  |  @u@ 2a7b6130 22e82e90 118
  |  @@u@@ 999
  |  cap: cap_readdir: HDXSBCH.inf
  |  @u@ 2a7b6430 22e82eb8 118
  |  @@u@@ 999
  |  cap: cap_readdir: HDXSEDS.inf
  |  @u@ 2a7b65b0 22e82ee0 118
  |  @@u@@ 999
  |  cap: cap_readdir: HDXSF.inf
  |  @u@ 2a7b6730 22e82f08 118
  |  ===============================================================
  |  INTERNAL ERROR: Signal 11: Segmentation fault in pid 94386 (4.16.11)
  |    :
  +------------------------------------------------------------------

... I can only assume that there is something in the implementation
of memcpy() that is incompatible with smbd's behavior... (maybe)
 (multi-core ? or threading ?)

Well, however, I feel very very very happy right now
because the diff.exe works well now. (and quiet /var/log/messages)

 - * - * -

ADDITIONAL INFO

[a] I also modified the version message to distinguish my smbd.

  (/usr/ports/net/samba416/work/samba-4.16.11/source3/include/smb.h)
      +------------------------------------------------------------------
   31 |
   32 |/* logged when starting the various Samba daemons */
      |#define Q_HACK_U  1
      |#if Q_HACK_U
      |#define COPYRIGHT_STARTUP_MESSAGE       "Copyright Andrew Tridgell and
the Samba Team 1992-2023 +u"
      |#else /* Q_HACK_U */
   33 |#define COPYRIGHT_STARTUP_MESSAGE       "Copyright Andrew Tridgell and
the Samba Team 1992-2022"
      |#endif /* Q_HACK_U */
   34 |
      +------------------------------------------------------------------

[b] The subjected 14.0R uses UFS file system. (/home is shared by Samba)
  +-----------------------------------------
  | % mount
  |/dev/ada0p3 on / (ufs, local, noatime)
  |devfs on /dev (devfs)
  |/dev/ada0p4 on /usr (ufs, local, noatime, soft-updates)
  |/dev/ada0p5 on /home (ufs, NFS exported, local, noatime, soft-updates)
  |/dev/md1 on /mnt (ufs, local)
  |fdescfs on /var/run/samba4/fd (fdescfs)
  +-----------------------------------------

[c] The depth of the directory hierarchy might have no relations
    to this problem. Attached sh-scripts: z2-make-test-files.sh
     makes a set of test files that was used by above debugging.
  +-----------------------------------------
  |% sh z2-make-test-files.sh
  |.......
            :
  |          ...... done
  |
  |% du -d2 -m teHWs/
  |620     teHWs/N3x50B-ITX/7464
  |621     teHWs/N3x50B-ITX/7492
  |322     teHWs/N3x50B-ITX/v15.38.2.64.4189
  |261     teHWs/N3x50B-ITX/v15.40.0.4177_BSW
  |1822    teHWs/N3x50B-ITX
  |1822    teHWs/
  |
  |% find teHWs/ -type f | wc -l
  |    1591
  |% find teHWs/ -type d | wc -l
  |     135
  +-----------------------------------------

    Since this problem may or may not appear once per comparison
    by diff.exe below (about 1min), so the frequency of occurrence
    is about less less less than 1/3182 (1591 files * 2 trials) in
    my environment...
  +-----------------------------------------
  |U:\> diff.exe -rq j:\win\tst\teHWs u:\tst\teHWs     (u: is win7 local disk)
  +-----------------------------------------

-- 
You are receiving this mail because:
You are the assignee for the bug.