Debugging newnfs

Rick Macklem rmacklem at uoguelph.ca
Fri Jun 20 21:11:03 UTC 2014


Daniel Mayfield wrote:
> 
> 
> The server side is a set of vlans on a lagg of 4 igbs.
I think igb net interfaces have a limit of 64 transmit segments
(IGB_MAX_SCATTER), so they should be ok with TSO enabled.

> The Xen side
> is the same setup, with the VMs in question attached to two
> different vlans.
> 
Well, from what I know, using lagg on top of a Xen/netfront net
device will definitely be a problem, unless you have r265290 and
r265412. (Without these patches, the setting of if_hw_tsomax done
by Xen's netfront is not propagated up to tcp_output(). The same
statements apply to if_vlan.c, with the patch r265291.)

I know nothing about Xen, so I have no idea if you are using the
Xen/netfront virtual net driver, but using lagg and/or vlan on
top of it is definitely broken without the recent patches.
If you can disable TSO, that will be a workaround for this.
 
> 
> Many different mounts, but the mount options all look like this:
> 
> 
> 
> nfsv3,tcp,resvport,hard,cto,lockd,sec=sys,acdirmin=3,acdirmax=60,acregmin=5,acregmax=60,nametimeo=60,negnametimeo=60,rsize=65536,wsize=65536,readdirsize=65536,readahead=1,wcommitsize=4048762,timeout=120,retrans=2
> 
> 
> The permissions do not change, but repeat operations succeed and fail
> randomly.
> 
> 
> 
> There aren't any clients concurrently accessing the same mount.
> 
> 
> 
> 
> 
> 
> On Fri, Jun 20, 2014 at 9:16 AM, Rick Macklem < rmacklem at uoguelph.ca
> > wrote:
> 
> 
> 
> 
> Daniel Mayfield wrote:
> > I have a very strange problem between an NFS server running FreeBSD
> > 10 w/ ZFS and a number of FreeBSD 10 VMs running on a XenServer 6.2
> > SP1 host. The problem manifests as seemingly random permissions
> > issues and/or IO errors on the clients when the ZFS pool is busy.
> > There are no entries in dmesg on either side, and no errors logged
> > in nfsstat either. If I keep the traffic down, the errors subside,
> > but not completely. Other than tcpdump, how can I go about
> > debugging this?
> > 
> Well, you didn't mention what mount options you are using or what
> network interfaces that you are using, but here's a few things that
> might be worth looking at...
> 
> The TSO max transmit segments issue:
> - Without going into all the details (there have been some recent
> commits like r264630 to try and alleviate this), if a net device
> driver cannot handle 35 mbufs in a transmit TSO segment, things
> will get broken.
> - Xen/netfront is a weird exception, which I think is ok so long
> as lagg or a vlan isn't layered on top of it.
> --> If can disable TSO on both server and clients or reduce
> rsize,wsize
> to 32K on all client mounts and see if the problem persists, that
> is probably the best way to check this. (Since Xen/netfront is
> such a weird case, I am not 100% sure if doing the above will fix
> this problem, if it is being used)
> 
> I also don't know if it is possible to have corrupted packets due to
> a hardware problem (bad memory or...) where the Xen/netfront world
> doesn't catch it.
> 
> If you use the "soft" mount option, you could easily get this when
> the server is slow to respond. I'd strongly recommend using "tcp"
> and not "soft" for your mounts. ("nfsstat -m" on the client will
> show you what the actual mount options is use are. This can be
> somewhat different than what is specified on the command line, since
> servers limit rsize/wsize, as an example.)
> 
> When you get a "permissions failure" case, check on the server to
> see if the permissions for the file appear correct on ZFS. If they
> are (or the problem disappears when you retry a command without
> changing permissions), you could have a caching issue. Other than
> capturing the packets and looking at them in wireshark (which knows
> NFS, unlike tcpdump) all you can do is try fiddling with the mount
> options related to caching and see if that helps. (Note that NFS
> does not have a cache coherency protocol, so if files are
> concurrently
> shared among multiple clients, all bets are off w.r.t. what the
> behaviour is. jhb@ is much better at this than I, since he seems
> to find lots of these weird cases at his workplace.)
> 
> Good luck with it, rick
> 
> > Dan
> > _______________________________________________
> > freebsd-fs at freebsd.org mailing list
> > http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> > To unsubscribe, send any mail to "
> > freebsd-fs-unsubscribe at freebsd.org "
> > 
> 
> 


More information about the freebsd-fs mailing list