kern/131342: [nfs] mounting/unmounting of disks causes NFS to fail

Martin Birgmeier Martin.Birgmeier at aon.at
Sat Jul 9 19:42:50 UTC 2011


Thank you for looking into this - answers below.

On 07/08/11 21:58, Rick Macklem wrote:
> Martin Birgmeier wrote:
>> The following reply was made to PR kern/131342; it has been noted by
>> GNATS.
>>
>> From: Martin Birgmeier<Martin.Birgmeier at aon.at>
>> To: bug-followup at FreeBSD.org
>> Cc:
>> Subject: Re: kern/131342: [nfs] mounting/unmounting of disks causes
>> NFS to
>> fail
>> Date: Fri, 08 Jul 2011 15:00:03 +0200
>>
>> This is a friendly reminder that some kind soul with knowledge of the
>> relevant kernel parts look into this... the error can easily be
>> reproduced. I just had it on a 7.4 system which did heavy reading from
>> an 8.2 server. When I mounted something on the server, the client got
>> a
>> "Permission denied" reply.
>>
>> So, to recap the scenario:
>>
>> 7.4 NFS client
>> 8.2 NFS server
>> client mounts a fs from the server (via IPv4, might be interesting to
>> look at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/151681, too,
>> but
>> that is unrelated)
>> client does heavy i/o on the mounted fs
>> server does a mount (on its side, in this case it was from an md
>> device)
>>
>> -->  error: client gets back some NFS error (in this case "permission
>> denied")
>>
> I just made a quick attempt and wasn't able to reproduce this. I mounted/unmounted
> a UFS volume on the server (both to a subdir of the exported volume and to a
> directory outside of the exported volume) while a client as accessing an exported fs
> and didn't get an error.
You'll need to be doing heavy NFS i/o from the client to the server 
while mounting/unmounting something on the server in order to reproduce 
the problem.
>
> Could the way you mount the volume on the server somehow end up renumbering the
> exported volumes? If so, the fsid in the file handle will no longer be able to
>    vfs_busyfs(fsid);
>    - and then the mount point will be broken until remounted by the client.
I am sorry, but I don't know how to check this (# of the exported 
volume). On the other hand, I do not believe it does - see below.
>
> I don't use anything like geom and don't use ZFS.
I had this problem earlier when I wasn't using ZFS, so it does not seem 
to be specific to ZFS. However, now the server is (also) running ZFS 
(root is on UFS).
>
> Since you can reproduce this easily, I'd suggest that you:
> 1 - look to make sure drives (the st_dev value returned by stat(2)) aren't being
>      renumbered by the mount. (If they are, then that has to be avoided if an NFS
>      export is to still work.)
> 2 - Try mounting/unmounting something else, to see if it is md specific.
I seem to remember that it's not only confined to adding an md-backed 
mount on the server, but that I also had this with CDROM mounts 
(mounting a CD on the server would result in a client error). I'd need 
to check that, but it might take a while.
>
> Also, does it only happen when there is a heavy load generated by the client or
> all the time? (If only under heavy load, it may be a mount list locking bug, since
> that's the only place where a mount of a non-exported volume on the server will
> affect the exported mounts, as far as I can see.)
I am quite sure it is mostly under heavy load; see also below.
>
> I don't mind looking at a packet trace (you can email me the file generated by
> "tcpdump -s 0 -w<file>  host<nfs-client>" when run on the server, but only if
> you can reproduce it without the heavy client load. (If only reproduced when there
> is a heavy client load a packet trace would be too big and probably not useful,
> since the bug is likely some race related to the mount list.)
Maybe I manage to reproduce it and cut it down sufficiently - this might 
take a while, though.
>
> rick
> ps: I assume you are referring to mounts that worked before the server mount
>      and not a case where the new mount was supposed to be exported.
That's clear.
>
> Oh, and one more question...
> Is the error persistent (ie. is the client mount unusable until remounted)
> or does the mount point work after the mount/unmount of the other volume
> has completed?
This seems to be a crucial question: in fact, after the single error 
event (which typically halts the heavy NFS i/o, therefore changes the 
situation - cf the question about load above), the mount continues to 
work perfectly. So referring to your question about renumbering above, 
I'd guess no, it does not get renumbered.
>
> If it just happens when the other volume is unmounted/mounted, make sure
> that you aren't using the "soft" option for your client mounts. ("soft"
> implies that an RPC fails after a timeout, and an unmount/mount of
> another volume could delay the RPC for a while, until the mount list
> is unlocked.)
I am not using soft mounts.
>
> rick
Regards,

Martin

p.s. Sending this also to freebsd-fs, but since I'm currently not 
subscribed, this might not make it through.


More information about the freebsd-fs mailing list