Re: git: c33509d49a6f - main - gssd: Fix handling of the gssname=<name> NFS mount option

From: Rick Macklem <rick.macklem_at_gmail.com>
Date: Wed, 11 Jan 2023 04:26:23 UTC
On Sat, Jan 7, 2023 at 6:04 PM Benjamin Kaduk <bjkfbsd@gmail.com> wrote:
>
> CAUTION: This email originated from outside of the University of Guelph. Do not click links or open attachments unless you recognize the sender and know the content is safe. If in doubt, forward suspicious emails to IThelp@uoguelph.ca
>
> On Sat, Jan 7, 2023 at 1:50 PM Rick Macklem <rmacklem@freebsd.org> wrote:
>>
>> The branch main has been updated by rmacklem:
>>
>> URL: https://cgit.FreeBSD.org/src/commit/?id=c33509d49a6fdcf86ef280a78f428d3cb7012c4a
>>
>> commit c33509d49a6fdcf86ef280a78f428d3cb7012c4a
>> Author:     Rick Macklem <rmacklem@FreeBSD.org>
>> AuthorDate: 2023-01-07 21:49:25 +0000
>> Commit:     Rick Macklem <rmacklem@FreeBSD.org>
>> CommitDate: 2023-01-07 21:49:25 +0000
>>
>>     gssd: Fix handling of the gssname=<name> NFS mount option
>>
>>     If an NFS mount using "sec=krb5[ip],gssname=<name>" is
>>     done, the gssd daemon fails.  There is a long delay
>>     (several seconds) in the gss_acquire_cred() call and then
>>     it returns success, but the credentials returned are
>>     junk.
>>
>>     I have no idea how long this has been broken, due to some
>>     change in the Heimdal gssapi library call, but I suspect
>>     it has been quite some time.
>>
>>     Anyhow, it turns out that replacing the "desired_name"
>>     argument with GSS_C_NO_NAME fixes the problem.
>>     Replacing the argument should not be a problem, since the
>>     TGT for the host based initiator credential in the default
>>     keytab file should be the only TGT in the gssd'd credential
>>     cache (which is not the one for uid 0).
>>
>>     I will try and determine if FreeBSD13 and/or FreeBSD12
>>     needs this same fix and will MFC if they need the fix.
>>
>>     This problem only affected Kerberized NFS mounts when the
>>     "gssname" mount option was used.  Other Kerberized NFS
>>     mount cases already used GSS_C_NO_NAME and work ok.
>>     A workaround if you do not have this patch is to do a
>>     "kinit -k host/FQDN" as root on the machine, followed by
>>     the Kerberized NFS mount without the "gssname" mount
>>     option.
>>
>
>
> Hi Rick,
>
> This doesn't seem like a good long-term fix.
> If we're going to have a gssname argument, we should actually make
> it take effect, rather than silently ignoring it, which is what using GSS_C_NO_NAME
> does (it indicates the use of "any credential", which ends up meaning the
> default credential when used on a GSS initiator).
>
> It should be possible to inspect the "junk" credential from gss_acquire_cred()
> and learn more about what happened (perhaps a non-kerberos mechanismm was
> picked, or the name was in the wrong format)  using various gss_inquire_*() calls,
> as a diagnostic measure.  Unfortunately I don't anticipate having a huge amount of time
> to put into it anytime soon...
>
I found the underlying problem. The upcall RPC from the kernel was timing out
at 25sec and the gss_acquire_cred() call was not done at that time.
(It was close.
gss_acquire_cred()  took about 27sec.) Then the kernel code would assume that
the gssd(8) daemon had gone away and closed the upcall socket. This made the
gssd(8) daemon to terminate, due to a SIGPIPE signal.

Increasing the timeout makes it work.

I am now "on the fence" w.r.t. leaving this patch in.  As I noted, I
think it is safe
to do, since the credential cache used by the gssd(8) daemon should only have
a TGT for the host-based client credential.
Without the patch, the mount takes almost 30sec instead of a fraction
of a second
with the patch (assuming the timeout has been increased, which turns out to be
needed for the case where a user's TGT has expired and they attempt to access
the mount).

If you really think it should be reverted, I can do that.

Thanks for your comments, rick
ps: I will be committing a change to increase the timeout.

> Thanks,
>
> Ben