nfsv4 kerberized and gssname=root and allgsname

Tue Oct 2 07:47:23 UTC 2012

2012/9/29 Rick Macklem <rmacklem at uoguelph.ca>:
> Ulysse 31 wrote:
>> Hi all,
>>
>> I am actually working on a freebsd 9 backup server.
>> this server would backup the production server via kerberized nfs4
>> (since the old backup server, a linux one, was doing so).
>> we used on the old backup server a root/<fqdn> kerberos identity,
>> which allows the backup server to access all the data.
>> I have followed the documentation found at :
>>
>> http://code.google.com/p/macnfsv4/wiki/FreeBSD8KerberizedNFSSetup
>>
>> done :
>> - added to kernel :
>>
>> options KGSSAPI
>> device crypto
>>
>> - added to rc.conf :
>>
>> nfs_client_enable="YES"
>> rpc_lockd_enable="YES"
>> rpc_statd_enable="YES"
>> rpcbind_enable="YES"
>> devfs_enable="YES"
>> gssd_enable="YES"
>>
>> - have done sysctl vfs.rpcsec.keytab_enctype=1 and added it to
>> /etc/sysctl.conf
>>
>> We used MIT kerberos implementation, since it is the one used on all
>> our servers (mostly linux), and we have created and /etc/krb5.keytab
>> containing the following keys :
>> host/<fqdn>
>> nfs/<fqdn>
>> root/<fqdn>
>>
>> and, of course, i have used the available patch at :
>> http://people.freebsd.org/~rmacklem/rpcsec_gss-9.patch
>>
>> When i try to mount with the (B) method (the one of the google wiki),
>> it works as expected, i mean, with a correct user credential, i can
>> access to the user data.
>> But, when i try to access via the (C) method (the one that i need in
>> order to do a full backup of the production storage server) i get a
>> systematic kernel panic when launch the mount command.
>> The mount command looks to something like : mount -t nfs -o
>> nfsv4,sec=krb5i,gssname=root,allgssname <production server
>> fqdn>:<export_path> <local_path_where_to_mount>
>> I have activated the kernel debugging stuff to get some infos, here is
>> the message :
>>
>>
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; apic id = 00
>> fault virtual address = 0x368
>> fault code = supervisor read data, page not present
>> instruction pointer = 0x20:0xffffffff80866ab7
>> stack pointer = 0x28:0xffffff804aa39ce0
>> frame pointer = 0x28:0xffffff804aa39d30
>> code segment = base 0x0, limit 0xfffff, type 0x1b
>> = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags = interrupt enabled, resume, IOPL = 0
>> current process = 701 (mount_nfs)
>> trap number = 12
>> panic: page fault
>> cpuid = 0
>> KDB: stack backtrace:
>> #0 0xffffffff808ae486 at kdb_backtrace+0x66
>> #1 0xffffffff8087885e at panic+0x1ce
>> #2 0xffffffff80b82380 at trap_fatal+0x290
>> #3 0xffffffff80b826b8 at trap_pfault+0x1e8
>> #4 0xffffffff80b82cbe at trap+0x3be
>> #5 0xffffffff80b6c57f at calltrap+0x8
>> #6 0xffffffff80a78eda at rpc_gss_init+0x72a
>> #7 0xffffffff80a79cd6 at rpc_gss_refresh_auth+0x46
>> #8 0xffffffff807a5a53 at newnfs_request+0x163
>> #9 0xffffffff807bf0f7 at nfsrpc_getattrnovp+0xd7
>> #10 0xffffffff807d9b29 at mountnfs+0x4e9
>> #11 0xffffffff807db60a at nfs_mount+0x13ba
>> #12 0xffffffff809068fb at vfs_donmount+0x100b
>> #13 0xffffffff80907086 at sys_nmount+0x66
>> #14 0xffffffff80b81c60 at amd64_syscall+0x540
>> #15 0xffffffff80b6c867 at Xfast_syscall+0xf7
>> Uptime: 2m31s
>> Dumping 97 out of 1002 MB:..17%..33%..50%..66%..83%..99%
>>
>> ------------------------------------------------------------------------
>>
>> Does anyone as experience something similar ? is their a way to
>> correct that ?
>> Thanks for the help.
>>
> Well, you're probably the first person to try doing this in years. I did
> have it working about 4-5years ago. Welcome to the bleeding edge;-)
>
> Could you do the following w.r.t. above kernel:
> # cd /boot/nkernel (or wherever the kernel lives)
> # nm kernel | grep rpc_gss_init
> - add the offset 0x72a to the address for rpc_gss_init
> # addr2line -e kernel.symbols
> 0xXXX - the hex number above (address of rpc_gss_init+0x72a)
> - email me what it prints out, so I know where the crash is occurring
>
> You could also run the following command on the Linux server to capture
> packets during the mount attempt, then email me the xxx.pcap file so I
> can look at it in wireshark, to see what is happening before the crash.
> (I'm guessing nr_auth is somehow bogus, but that's just a guess.:-)
> # tcpdump -s 0 -w xxx.pcap host <freebsd-client>

Hi,

Sorry for the delay i was on travel and no working network connection.
Back online for the rest of the week ^^.
Thanks for your help, here is what it prints out :

root at bsdenc:/boot/kernel # nm kernel | grep rpc_gss_init
ffffffff80df07b0 r __set_sysinit_set_sym_svc_rpc_gss_init_sys_init
ffffffff80a787b0 t rpc_gss_init
ffffffff80a7a580 t svc_rpc_gss_init
ffffffff81127530 d svc_rpc_gss_init_sys_init
ffffffff80a7a3b0 T xdr_rpc_gss_init_res
root at bsdenc:/boot/kernel # addr2line -e kernel.symbols
0xffffffff80a78eda
/usr/src/sys/rpc/rpcsec_gss/rpcsec_gss.c:772

for the tcpdump from the linux server, i think you may are doing
reference to the production nfs server ?
if yes, unfortunately it is not linux, it is a netapp filer, so no
"real" root access on it (so no tcpdump available :s ).
if you were mentioning the old backup server (which is linux but nfs
client), i cannot do unmount/mount on it since its production
(mountpoint always busy), but i can made a quick VM/testmachine that
acts like the linux backup server and do a tcpdump from it.
Just let me know. Thanks again.

--
Ulysse31

>
> rick
>
>> --
>> Ulysse31
>> _______________________________________________
>> freebsd-fs at freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
>> To unsubscribe, send any mail to "freebsd-fs-unsubscribe at freebsd.org"