svn commit: r238907 - projects/calloutng/sys/kern

Mon Oct 29 01:22:57 UTC 2012

On Wed, Oct 10, 2012 at 5:49 PM, Attilio Rao <attilio at freebsd.org> wrote:
> On Tue, Sep 18, 2012 at 1:13 AM, Attilio Rao <attilio at freebsd.org> wrote:
>> On Thu, Aug 2, 2012 at 9:56 PM, Attilio Rao <attilio at freebsd.org> wrote:
>>> On 7/30/12, John Baldwin <jhb at freebsd.org> wrote:
>>
>> [ trimm ]
>>
>>>> --- //depot/projects/smpng/sys/kern/kern_rmlock.c     2012-03-25
>>>> 18:45:29.000000000 0000
>>>> +++ //depot/user/jhb/lock/kern/kern_rmlock.c  2012-06-18 21:20:58.000000000
>>>> 0000
>>>> @@ -70,6 +70,9 @@
>>>>  }
>>>>
>>>>  static void  assert_rm(const struct lock_object *lock, int what);
>>>> +#ifdef DDB
>>>> +static void  db_show_rm(const struct lock_object *lock);
>>>> +#endif
>>>>  static void  lock_rm(struct lock_object *lock, int how);
>>>>  #ifdef KDTRACE_HOOKS
>>>>  static int   owner_rm(const struct lock_object *lock, struct thread
>>>> **owner);
>>>
>>> While here, did you consider also:
>>> - Abstracting compiler_memory_barrier() into a MI, compiler dependent function?
>>
>> So what do you think about this patch? (Please double-check the GIT log).
>
> The real reason why I wanted an abstract __compiler_membar() was to
> cleanly fix a bug that I think affects sched_pin()/sched_unpin() right
> now.
> Infact, I think the compiler can reorder operations around them making
> their usage completely bogus. In addition, td_pinned field is not even
> marked as volatile, thus I think currently the compiler can decide to
> do stupid things like caching the value of td_pinned in CPU registers
> or other optimizations which invalidate scheduler pinning.
>
> I reviewed all the possible races involved here with Jeff and John and
> after some discussion I think that in order to prevent these side
> effects the usage of compiler memory barriers in the operations is
> both due and effective (see the attached patch). However, please note
> that the presence of the compiler membar should not really change the
> code, assuming current compilers don't screw something up. I verified
> this is not the case with this very simple environment:
> - amd64 CURRENT configuration, without the kernel debugging options
> (in order to increase chances of optimizations)
> - kernel compared are pre- and post- patch
> - gcc is default FreeBSD: gcc version 4.2.1 20070831 patched [FreeBSD]
>
> [root at netboot-amd64 ~]# ls -al kernel.nopatch.md5 kernel.patch-novolatile.md5
> -rwxr-xr-x  1 root  wheel  19807568 Oct 10 16:04 kernel.nopatch.md5
> -rwxr-xr-x  1 root  wheel  19807568 Oct 10 16:41 kernel.patch-novolatile.md5
> [root at netboot-amd64 ~]# md5 kernel.nopatch.md5
> MD5 (kernel.nopatch.md5) = 7c5c5e33f2547a5e5bc701a3b124f0d9
> [root at netboot-amd64 ~]# md5 kernel.patch-novolatile.md5
> MD5 (kernel.patch-novolatile.md5) = 91c51afb4cc4ad229cc28da2024d8f54
>
> So as you can see the size of the kernel is not changed but the md5
> CRC actually is. This should point in the direction that code actually
> might have changed (unless someone has a good explanation for this
> already) and then there could be some code reshuffling due to the use
> of compiler membars.
>
> I also tried a version using volatile for td_pinned. I don't see why
> this would be needed, but maybe someone would use this for
> consistency. The following kernel.patch.md5 contains basically the
> membars and the volatile marker for td_pinned:
>
> [root at netboot-amd64 ~]# ls -al kernel.nopatch.md5 kernel.patch.md5
> -rwxr-xr-x  1 root  wheel  19807568 Oct 10 16:04 kernel.nopatch.md5
> -rwxr-xr-x  1 root  wheel  19807824 Oct 10 16:09 kernel.patch.md5
>
> As you can see the size of the kernel actually increased. I think that
> gcc should probabilly produce very unoptimized code for volatile
> operands, at least that is also what I understood by reading the
> arguments behind atomic_t type in Linux (not using volatile by
> default).
> Considering this, I think I should prevent to be using volatile for
> td_pinned and possibly we should think carefully when introducing new
> volatile members in the future.
>
> Thanks,
> Attilio
>
> Index: sys/sys/sched.h
> ===================================================================
> --- sys/sys/sched.h     (revision 241395)
> +++ sys/sys/sched.h     (working copy)
> @@ -146,11 +146,13 @@ static __inline void
>  sched_pin(void)
>  {
>         curthread->td_pinned++;
> +       __compiler_membar();
>  }
>
>  static __inline void
>  sched_unpin(void)
>  {
> +       __compiler_membar();
>         curthread->td_pinned--;
>  }

(Sorry for quoting it all, but I think the context is important for this).
I've made more tests with this patch. I've compiled a GENERIC but
debugging options (in order to eventually increase the likelyhood of
optimizations by the compiler) using standard shipped gcc:
[root at netboot-amd64 /usr/obj/root/CURRENT/sys]# gcc --version
gcc (GCC) 4.2.1 20070831 patched [FreeBSD]

I've then compiled produced code of splitted modules, for consumers of
a base amd64 kernel where sched_pin() is used. More specifically I
compared the produced code for: pmap.o, sched_ule.o, netisr.o,
kern_rmlock.o, vm_glue.o, clock.o. For almost of all of them the
produced code is exactly the same but for pmap.o, where is really
minor and not decisive difference is found (in pmap_update_pde()):
@@ -590,9 +590,9 @@ Disassembly of section .text:
      758:      00
      759:      48 8b 35 00 00 00 00    mov    0x0(%rip),%rsi        # 760 <pmap
_update_pde+0x30>
      760:      41 b8 01 00 00 00       mov    $0x1,%r8d
-     766:      89 4d fc                mov    %ecx,-0x4(%rbp)
-     769:      49 d3 e0                shl    %cl,%r8
-     76c:      49 81 f9 00 00 00 00    cmp    $0x0,%r9
+     766:      49 d3 e0                shl    %cl,%r8
+     769:      49 81 f9 00 00 00 00    cmp    $0x0,%r9
+     770:      89 4d fc                mov    %ecx,-0x4(%rbp)
      773:      48 89 f7                mov    %rsi,%rdi
      776:      74 04                   je     77c <pmap_update_pde+0x4c>
      778:      49 8b 79 38             mov    0x38(%r9),%rdi

I think that this basically proves 2 things: FreeBSD should be
generally free from reordering bug by incident right now (at least for
amd64) and that sched_pin()/unpin() are used so seldomly that they
don't make a real huge impact even in conjuction with a compiler
memory barrier.

I do not really epect that newer version of gcc or clang are going to
pessimize such case,  hence, I'm going to commit this patch to enforce
correctness.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein