svn commit: r247116 - in head/sys: fs/nfs fs/nfsclient kern nfsclient sys tools

Mon Feb 25 21:00:17 UTC 2013

On Feb 25, 2013, at 4:36 AM, Andrew Turner wrote:

> On Mon, 25 Feb 2013 10:50:19 +0200
> Konstantin Belousov <kostikbel at gmail.com> wrote:
> 
>> On Mon, Feb 25, 2013 at 08:13:13PM +1300, Andrew Turner wrote:
>>> On Thu, 21 Feb 2013 19:02:50 +0000 (UTC)
>>> John Baldwin <jhb at FreeBSD.org> wrote:
>>> 
>>>> Author: jhb
>>>> Date: Thu Feb 21 19:02:50 2013
>>>> New Revision: 247116
>>>> URL: http://svnweb.freebsd.org/changeset/base/247116
>>>> 
>>>> Log:
>>>>  Further refine the handling of stop signals in the NFS client.
>>>> The changes in r246417 were incomplete as they did not add
>>>> explicit calls to sigdeferstop() around all the places that
>>>> previously passed SBDRY to _sleep().  In addition,
>>>> nfs_getcacheblk() could trigger a write RPC from getblk()
>>>> resulting in sigdeferstop() recursing. Rather than manually
>>>> deferring stop signals in specific places, change the VFS_*() and
>>>> VOP_*() methods to defer stop signals for filesystems which
>>>> request this behavior via a new VFCF_SBDRY flag. Note that this
>>>> has to be a VFC flag rather than a MNTK flag so that it works
>>>> properly with VFS_MOUNT() when the mount is not yet fully
>>>> constructed.  For now, only the NFS clients are set this new flag
>>>> in VFS_SET(). A few other related changes:
>>>>  - Add an assertion to ensure that TDF_SBDRY doesn't leak to
>>>> userland.
>>>>  - When a lookup request uses VOP_READLINK() to follow a symlink,
>>>> mark the request as being on behalf of the thread performing the
>>>> lookup (cnp_thread) rather than using a NULL thread pointer.  This
>>>> causes NFS to properly handle signals during this VOP on an
>>>> interruptible mount.
>>>> 
>>>>  PR:		kern/176179
>>>>  Reported by:	Russell Cattelan (sigdeferstop() recursion)
>>>>  Reviewed by:	kib
>>>>  MFC after:	1 month
>>> 
>>> This change is causing init to crash for me on armv6. I'm
>>> netbooting a PandaBoard and it appears init is receiving a SIGABRT
>>> before it gets into main().
>>> 
>>> Do you have any idea where I could look to track down why it is
>>> doing this?
>> 
>> It is weird. SIGABRT sent by the kernel usually means that execve(2)
>> already destroyed the previous address space of the process, but the
>> new image cannot be activated, most likely due to image format error
>> discovered too late, or resource shortage.
>> 
>> Could it be that some NFS RPC fails after the patch, but I cannot
>> imagine why. You would need to track this. Also, verify that the init
>> binary is correct.
>> 
>> I tried amd64 netboot, and it worked fine.
> 
> It looks like this change is not the issue, it just changed the
> symptom enough for me to not realise I was seeing an issue where
> it would crash the kernel before. I reinstated this change but only
> allowed the kernel to access half the memory and it booted correctly.
> 
> The real issue appears to be related to something in the vm layer not
> working on ARM boards with too much memory (somewhere between 512MiB
> and 1GiB).


The recently introduced auto-sizing and cap may be too optimistic.  In fact, they are greater
than what we allow on 32-bit x86 and 32-bit MIPS.  Try the following.

Index: arm/include/vmparam.h
===================================================================

--- arm/include/vmparam.h	(revision 247249)
+++ arm/include/vmparam.h	(working copy)
@@ -142,15 +142,15 @@
 #define VM_KMEM_SIZE		(12*1024*1024)
 #endif
 #ifndef VM_KMEM_SIZE_SCALE
-#define VM_KMEM_SIZE_SCALE	(2)
+#define VM_KMEM_SIZE_SCALE	(3)
 #endif
 
 /*
- * Ceiling on the size of the kmem submap: 60% of the kernel map.
+ * Ceiling on the size of the kmem submap: 40% of the kernel map.
  */
 #ifndef VM_KMEM_SIZE_MAX
 #define	VM_KMEM_SIZE_MAX	((vm_max_kernel_address - \
-    VM_MIN_KERNEL_ADDRESS + 1) * 3 / 5)
+    VM_MIN_KERNEL_ADDRESS + 1) * 2 / 5)
 #endif
 
 #define MAXTSIZ 	(16*1024*1024)