FreeBSD NFS client/Linux NFS server issue

Mikolaj Golub to.my.trociny at gmail.com
Tue Jan 19 07:58:37 UTC 2010


On Wed, 13 Jan 2010 11:13:14 +0200 Mikolaj Golub wrote:

> On Sun, 10 Jan 2010 11:03:56 +0200 Mikolaj Golub wrote:

> So because it was appending to the file every php write call caused the
> sequence of the following rpc: ACCESS - READ - WRITE - COMMIT. And trying to
> flush the next line of the log it got stuck after READ call (the next should
> be WRITE call but client never did it).
>
> The same thing is for other log file written by othe php process. The last rpc
> for this file:
>
> 30990 18:02:05.050063 172.30.10.54 172.30.10.83 NFS V3 READ Call (Reply In 31068), FH:0x532fa29d Offset:131072 Len:2686
> 31068 18:02:05.062801 172.30.10.83 172.30.10.54 NFS V3 READ Reply (Call In 30990) Len:2685
>
> A bit later there were several successful COMMIT calls (when php processes
> were closing other files I think). And other NFS activity was observed -- our
> nagios checks and other applications, which was just looking for presence and
> status of certain files, were running successfully and in tcpdump there are
> successful readdir/access/lookup/fstat calls. df utility did not hanged then
> too.
>
> Later when our engineer tried to access the mounted folder with mc the
> process locked acquiring nfs vn_lock held by php script (td=0xc6bf4690):

Analyzing logs of our php scripts we have found that we had cases when a
process (or two simultaneously) got stuck writing to NFS and then later they
were "unfrozen" by another started php process when it was writing to this NFS
share (in some other log file). We have tcpdump for such case and it looks
like the following:

1) ACCESS - READ - WRITE - COMMIT sequences when the php process is writing to
log file.

2) Then at some moment this stops after READ rpc call and successful reply.

3) After this successful readdir/access/lookup/fstat calls are observed from
our other utilities, which just check the presence of some files.

4) New php process starts and writes to some other log file (successful ACCESS
- READ - WRITE - COMMIT sequences). After this writing to the first file
continues too (starting from WRITE rpc, so there is no any retransmits).

As a workaround we installed cron scripts that just write to some file every 2
minutes. We have been running this for 3 days and there have not been
incidents since then but actually we will be able to say if this really has
helped only after running a week and more.

Also we are upgrading one of our servers, where the problem has been observed
most frequently to 7.2). Actually we have many FreeBSD7.1 hosts with NFS
mounts but the problem has been observed only on 3 of them and currently we
don't know a way to reproduce it.

-- 
Mikolaj Golub


More information about the freebsd-fs mailing list