nfsv4 kerberized and gssname=root and allgsname

Thu Oct 4 05:53:08 UTC 2012

Le 4 oct. 2012 à 00:35, Rick Macklem <rmacklem at uoguelph.ca> a écrit :

> Ulysse 31 wrote:
>> 2012/9/29 Rick Macklem <rmacklem at uoguelph.ca>:
>>> Ulysse 31 wrote:
>>>> Hi all,
>>>> 
>>>> I am actually working on a freebsd 9 backup server.
>>>> this server would backup the production server via kerberized nfs4
>>>> (since the old backup server, a linux one, was doing so).
>>>> we used on the old backup server a root/<fqdn> kerberos identity,
>>>> which allows the backup server to access all the data.
>>>> I have followed the documentation found at :
>>>> 
>>>> http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup
>>>> 
>>>> done :
>>>> - added to kernel :
>>>> 
>>>> options KGSSAPI
>>>> device crypto
>>>> 
>>>> - added to rc.conf :
>>>> 
>>>> nfs_client_enable="YES"
>>>> rpc_lockd_enable="YES"
>>>> rpc_statd_enable="YES"
>>>> rpcbind_enable="YES"
>>>> devfs_enable="YES"
>>>> gssd_enable="YES"
>>>> 
>>>> - have done sysctl vfs.rpcsec.keytab_enctype=1 and added it to
>>>> /etc/sysctl.conf
>>>> 
>>>> We used MIT kerberos implementation, since it is the one used on
>>>> all
>>>> our servers (mostly linux), and we have created and
>>>> /etc/krb5.keytab
>>>> containing the following keys :
>>>> host/<fqdn>
>>>> nfs/<fqdn>
>>>> root/<fqdn>
>>>> 
>>>> and, of course, i have used the available patch at :
>>>> http://people.freebsd.org/~rmacklem/rpcsec_gss-9.patch
>>>> 
>>>> When i try to mount with the (B) method (the one of the google
>>>> wiki),
>>>> it works as expected, i mean, with a correct user credential, i can
>>>> access to the user data.
>>>> But, when i try to access via the (C) method (the one that i need
>>>> in
>>>> order to do a full backup of the production storage server) i get a
>>>> systematic kernel panic when launch the mount command.
>>>> The mount command looks to something like : mount -t nfs -o
>>>> nfsv4,sec=krb5i,gssname=root,allgssname <production server
>>>> fqdn>:<export_path> <local_path_where_to_mount>
> Just to confirm it, you are saying that exactly the same mount command,
> except without the "allgssname" option, doesn't crash?

No, in fact it's the same command with gssname=nfs instead of gssname=root that does not crash. When I specify gssname=root it panics.
The same command with gssname=nfs and allgssname together "works", well should say mounts and don't crash because it does not allow accessing as root to the nfs share since the netapp expects a root/fqdn key to be used for that.
Don't know if this would give you an hint, I'm gonna test this patch. tell me if you have other ideas.
For now we decided disabling kerberised nfs on the new FreeBSD backup server in order to go on production with it without getting late.
Thanks for the help.

> 
> That is weird, since when I look at the code, there shouldn't be any
> difference between the two mounts, up to the point where it crashes.
> 
> The crash seems to indicate that nr_auth is bogus, but I can't see
> how/why that would happen.
> 
> I have attached a patch which changes the way nr_auth is set and "might"
> help, although I doubt it. (It is untested, but if you want to try it,
> good luck with it.)
> 
> I'll email again if I get something more solid figured out, rick
> 
>>>> I have activated the kernel debugging stuff to get some infos, here
>>>> is
>>>> the message :
>>>> 
>>>> 
>>>> Fatal trap 12: page fault while in kernel mode
>>>> cpuid = 0; apic id = 00
>>>> fault virtual address = 0x368
>>>> fault code = supervisor read data, page not present
>>>> instruction pointer = 0x20:0xffffffff80866ab7
>>>> stack pointer = 0x28:0xffffff804aa39ce0
>>>> frame pointer = 0x28:0xffffff804aa39d30
>>>> code segment = base 0x0, limit 0xfffff, type 0x1b
>>>> = DPL 0, pres 1, long 1, def32 0, gran 1
>>>> processor eflags = interrupt enabled, resume, IOPL = 0
>>>> current process = 701 (mount_nfs)
>>>> trap number = 12
>>>> panic: page fault
>>>> cpuid = 0
>>>> KDB: stack backtrace:
>>>> #0 0xffffffff808ae486 at kdb_backtrace+0x66
>>>> #1 0xffffffff8087885e at panic+0x1ce
>>>> #2 0xffffffff80b82380 at trap_fatal+0x290
>>>> #3 0xffffffff80b826b8 at trap_pfault+0x1e8
>>>> #4 0xffffffff80b82cbe at trap+0x3be
>>>> #5 0xffffffff80b6c57f at calltrap+0x8
>>>> #6 0xffffffff80a78eda at rpc_gss_init+0x72a
>>>> #7 0xffffffff80a79cd6 at rpc_gss_refresh_auth+0x46
>>>> #8 0xffffffff807a5a53 at newnfs_request+0x163
>>>> #9 0xffffffff807bf0f7 at nfsrpc_getattrnovp+0xd7
>>>> #10 0xffffffff807d9b29 at mountnfs+0x4e9
>>>> #11 0xffffffff807db60a at nfs_mount+0x13ba
>>>> #12 0xffffffff809068fb at vfs_donmount+0x100b
>>>> #13 0xffffffff80907086 at sys_nmount+0x66
>>>> #14 0xffffffff80b81c60 at amd64_syscall+0x540
>>>> #15 0xffffffff80b6c867 at Xfast_syscall+0xf7
>>>> Uptime: 2m31s
>>>> Dumping 97 out of 1002 MB:..17%..33%..50%..66%..83%..99%
>>>> 
>>>> ------------------------------------------------------------------------
>>>> 
>>>> Does anyone as experience something similar ? is their a way to
>>>> correct that ?
>>>> Thanks for the help.
>>>> 
>>> Well, you're probably the first person to try doing this in years. I
>>> did
>>> have it working about 4-5years ago. Welcome to the bleeding edge;-)
>>> 
>>> Could you do the following w.r.t. above kernel:
>>> # cd /boot/nkernel (or wherever the kernel lives)
>>> # nm kernel | grep rpc_gss_init
>>> - add the offset 0x72a to the address for rpc_gss_init
>>> # addr2line -e kernel.symbols
>>> 0xXXX - the hex number above (address of rpc_gss_init+0x72a)
>>> - email me what it prints out, so I know where the crash is
>>> occurring
>>> 
>>> You could also run the following command on the Linux server to
>>> capture
>>> packets during the mount attempt, then email me the xxx.pcap file so
>>> I
>>> can look at it in wireshark, to see what is happening before the
>>> crash.
>>> (I'm guessing nr_auth is somehow bogus, but that's just a guess.:-)
>>> # tcpdump -s 0 -w xxx.pcap host <freebsd-client>
>> 
>> Hi,
>> 
>> Sorry for the delay i was on travel and no working network connection.
>> Back online for the rest of the week ^^.
>> Thanks for your help, here is what it prints out :
>> 
>> root at bsdenc:/boot/kernel # nm kernel | grep rpc_gss_init
>> ffffffff80df07b0 r __set_sysinit_set_sym_svc_rpc_gss_init_sys_init
>> ffffffff80a787b0 t rpc_gss_init
>> ffffffff80a7a580 t svc_rpc_gss_init
>> ffffffff81127530 d svc_rpc_gss_init_sys_init
>> ffffffff80a7a3b0 T xdr_rpc_gss_init_res
>> root at bsdenc:/boot/kernel # addr2line -e kernel.symbols
>> 0xffffffff80a78eda
>> /usr/src/sys/rpc/rpcsec_gss/rpcsec_gss.c:772
>> 
>> 
>> for the tcpdump from the linux server, i think you may are doing
>> reference to the production nfs server ?
>> if yes, unfortunately it is not linux, it is a netapp filer, so no
>> "real" root access on it (so no tcpdump available :s ).
>> if you were mentioning the old backup server (which is linux but nfs
>> client), i cannot do unmount/mount on it since its production
>> (mountpoint always busy), but i can made a quick VM/testmachine that
>> acts like the linux backup server and do a tcpdump from it.
>> Just let me know. Thanks again.
>> 
>> --
>> Ulysse31
>> 
>>> 
>>> rick
>>> 
>>>> --
>>>> Ulysse31
>>>> _______________________________________________
>>>> freebsd-fs at freebsd.org mailing list
>>>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>>>> To unsubscribe, send any mail to
>>>> "freebsd-fs-unsubscribe at freebsd.org"
> <rpcsec-crash.patch>