'make -j16 universe' gives SIReset

Marius Strobl marius at alchemy.franken.de
Tue Oct 11 20:55:49 UTC 2011


On Tue, Oct 11, 2011 at 02:05:29PM +1100, Peter Jeremy wrote:
> On 2011-Oct-06 14:04:11 +0200, Marius Strobl <marius at alchemy.franken.de> wrote:
> >FYI, as of r226057 I made sparc64 compatible with SCHED_ULE and vice versa
> >as requested. It would be great if you could test SCHED_ULE with buildworlds
> >your V890.
> 
> Thank you for that.  I've updated my V890 to r226167 and done some
> -j20 buildworlds with both SCHED_4BSD and SCHED_ULE immediately after
> rebooting.  Raw time(1) results are:
> 
> SCHED_4BSD:
> 16090.866u 9405.784s 51:18.57 828.1% 4514+1841k 31780+5809io 13927pf+0w
> 16061.213u 9530.148s 50:02.60 852.3% 4493+1837k 2970+5844io 12101pf+0w
> 16044.606u 9527.613s 50:08.29 850.0% 4493+1837k 3016+5840io 12082pf+0w
> 16073.767u 9514.049s 50:05.28 851.4% 4496+1838k 2955+5822io 12081pf+0w
> 
> SCHED_ULE:
> 15160.032u 8739.799s 50:03.55 795.7%  4540+1848k 30162+4604io 13920pf+0w
> 15196.869u 8753.567s 48:49.01 817.6%  4537+1849k 2988+5891io 12094pf+0w
> 15193.226u 8726.949s 49:02.73 812.8%  4541+1850k 3031+5990io 12083pf+0w
> 15194.957u 8753.260s 49:03.19 813.6%  4537+1849k 2965+6028io 12082pf+0w
> 15167.735u 8744.979s 48:55.28 814.6%  4538+1849k 2964+6011io 12081pf+0w
> 15170.366u 8737.994s 49:01.01 812.9%  4537+1849k 2977+5930io 12081pf+0w
> 
> Ignoring the first buildworld in both cases (when the caches were cold),
> ministat shows that ULE is faster (which I would expect):

Well, in my tests 4BSD performed better than ULE but I only have 2-way and
one 4-way machines and it's clear from the contention of the global 4BSD
sched_lock alone that there must be some point where ULE scales better.
So the real question is does the average sparc64 user have more than
that number of cores in a machine (probably not)? On the other hand in my
tests ULE+the atomic operation improvements roughly performed the same as
4BSD without these optimizations so there at least would be no net
performance regression when switching to ULE now.

> User time:
> x buildtimes.4BSD
> + buildtimes.ULE
> +----------------------------------------------------------------------+
> |  +                                                                   |
> |+ +                                                                   |
> |+ +                                                                xxx|
> ||AM                                                                |A||
> +----------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x   3     16044.606     16073.767     16061.213     16059.862     14.627368
> +   5     15167.735     15196.869     15193.226     15184.631     14.311132
> Difference at 95.0% confidence
>         -875.231 +/- 25.7643
>         -5.44981% +/- 0.160426%
>         (Student's t, pooled s = 14.4173)
> 
> System time:
> x buildtimes.4BSD
> + buildtimes.ULE
> +----------------------------------------------------------------------+
> |  +                                                                   |
> |  +                                                                  x|
> |+++                                                                x x|
> ||AM                                                                 AM|
> +----------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x   3      9514.049      9530.148      9527.613     9523.9367     8.6562706
> +   5      8726.949      8753.567      8744.979     8743.3498     11.213032
> Difference at 95.0% confidence
>         -780.587 +/- 18.6399
>         -8.19605% +/- 0.195717%
>         (Student's t, pooled s = 10.4306)
> Wallclock time:
> x buildtimes.4BSD
> + buildtimes.ULE
> +----------------------------------------------------------------------+
> |            +                                                         |
> |+    +    + +                                                   x x  x|
> |   |____A_M__|                                                  |_A__||
> +----------------------------------------------------------------------+
>     N           Min           Max        Median           Avg        Stddev
> x   3        3002.6       3008.29       3005.28       3005.39     2.8465945
> +   5       2929.01       2943.19       2941.01      2938.244     6.0475185
> Difference at 95.0% confidence
>         -67.146 +/- 9.29992
>         -2.23419% +/- 0.309441%
>         (Student's t, pooled s = 5.2041)
> 
> 
> I also ran 6 parallel -j16 buildworlds as a stress test and that was
> less successful:
> panic: mutex vm object not owned at /usr/src/sys/vm/vm_object.c:281

That line should be in vm_object_clear_flag(). The backtrace seems to be
incomplete as it doesn't include that function. I'm not sure what to make
out of the "VNASSERT failed" and "tag ufs, type VDIR" lines. I guess this
could mean that there were two panics at the same time. Does the crashdump
provide a better backtrace?

> VNASSERT failed
> VNASSERT failed
> 0xfffff8a3467cdc20: VNASSERT failed
> tag ufs, type VDIR
> 0xfffff8a3467cdc20: 0xfffff8a3467cdc20:     usecount 7, writecount 0, refcount 1
> 0 mountedhere 0
> tag ufs, type VDIR
>     flags ()
> VNASSERT failed
> cpuid = 13
> VNASSERT failed
> KDB: enter: panic
> tag ufs, type VDIR
> [ thread pid 1299 tid 100640 ]
> Stopped at      kdb_enter+0x80: ta              %xcc, 1
> db> KDB: stack backtrace:
> KDB: stack backtrace:
> mi_switch() at mi_switch+0x1a8
> sched_preempt() at sched_preempt+0xbc
> cpu_ipi_preempt() at cpu_ipi_preempt+0x4
> -- interrupt level=0x6 pil=0x5 %o7=0xc085f414 --
> cpu_ipi_stop() at cpu_ipi_stop+0x110
> -- interrupt level=0x5 pil=0 %o7=0x3d280c --
> userland() at 0x3d2458
> user trace: trap %o7=0x3d280c
> pc 0x3d2458, sp 0x7fdffffccd1
> pc 0x179724, sp 0x7fdffffcda1
> pc 0x179b38, sp 0x7fdffffce61
> pc 0x179c64, sp 0x7fdffffcf21
> pc 0x179c84, sp 0x7fdffffcfe1
> pc 0x415af0, sp 0x7fdffffd0a1
> pc 0x143fe0, sp 0x7fdffffd161
> pc 0x164c58, sp 0x7fdffffd221
> pc 0x165840, sp 0x7fdffffd2e1
> pc 0x148670, sp 0x7fdffffd3a1
> pc 0x1a0c30, sp 0x7fdffffd471
> pc 0x1001d0, sp 0x7fdffffd531
> pc 0, sp 0x7fdffffd5f1
> done
> KDB: stack backtrace:
> KDB: stack backtrace:
> mi_switch() at mi_switch+0x1a8
> sched_preempt() at sched_preempt+0xbc
> cpu_ipi_preempt() at cpu_ipi_preempt+0x4
> -- interrupt level=0x6 pil=0x5 %o7=0xc085f414 --
> cpu_ipi_stop() at cpu_ipi_stop+0x110
> -- interrupt level=0x5 pil=0 %o7=0xc04ec15c --
> _mtx_lock_spin() at _mtx_lock_spin+0xb4
> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x1b8
> cnputs() at cnputs+0x5c
> vprintf() at vprintf+0x9c
> printf() at printf+0x20
> vn_printf() at vn_printf+0x60
> _vn_lock() at _vn_lock+0x2c
> cache_lookup() at cache_lookup+0x9a4
> vfs_cache_lookup() at vfs_cache_lookup+0xbc
> VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x110
> lookup() at lookup+0x7e0
> namei() at namei+0x6c0
> vn_open_cred() at vn_open_cred+0x31c
> vn_open() at vn_open+0x1c
> kern_openat() at kern_openat+0x1a4
> kern_open() at kern_open+0x18
> sys_open() at sys_open+0x14
> syscall() at syscall+0x2d8
> -- syscall (5, FreeBSD ELF64, sys_open) %o7=0x49fb98 --
> userland() at 0x4e6bc8
> user trace: trap %o7=0x49fb98
> pc 0x4e6bc8, sp 0x7fdffffca81
> pc 0x4a11a8, sp 0x7fdffffcb41
> pc 0x4a15f4, sp 0x7fdffffcc81
> pc 0x4a5904, sp 0x7fdffffcd41
> pc 0x4a6930, sp 0x7fdffffce11
> pc 0x49ebb0, sp 0x7fdffffced1
> pc 0x49a4f4, sp 0x7fdffffcf91
> pc 0x14fbf8, sp 0x7fdffffd0b1
> pc 0x101160, sp 0x7fdffffd1f1
> pc 0x10ecf8, sp 0x7fdffffd2b1
> pc 0x1144c4, sp 0x7fdffffd3c1
> pc 0x1a05a8, sp 0x7fdffffd481
> pc 0x1001d0, sp 0x7fdffffd541
> pc 0, sp 0x7fdffffd601
> done
> KDB: stack backtrace:
> KDB: stack backtrace:
> mi_switch() at mi_switch+0x1a8
> sched_preempt() at sched_preempt+0xbc
> cpu_ipi_preempt() at cpu_ipi_preempt+0x4
> -- interrupt level=0x6 pil=0x5 %o7=0xc085f414 --
> cpu_ipi_stop() at cpu_ipi_stop+0x110
> -- interrupt level=0x5 pil=0 %o7=0xc04ec15c --
> _mtx_lock_spin() at _mtx_lock_spin+0x70
> _mtx_lock_spin_flags() at _mtx_lock_spin_flags+0x1b8
> cnputs() at cnputs+0x5c
> vprintf() at vprintf+0x9c
> printf() at printf+0x20
> vn_printf() at vn_printf+0x60
> _vn_lock() at _vn_lock+0x2c
> cache_lookup() at cache_lookup+0x9a4
> vfs_cache_lookup() at vfs_cache_lookup+0xbc
> VOP_LOOKUP_APV() at VOP_LOOKUP_APV+0x110
> lookup() at lookup+0x7e0
> namei() at namei+0x6c0
> vn_open_cred() at vn_open_cred+0x31c
> vn_open() at vn_open+0x1c
> kern_openat() at kern_openat+0x1a4
> kern_open() at kern_open+0x18
> sys_open() at sys_open+0x14
> syscall() at syscall+0x2d8
> -- syscall (5, FreeBSD ELF64, sys_open) %o7=0x49fb98 --
> userland() at 0x4e6bc8
> user trace: trap %o7=0x49fb98
> pc 0x4e6bc8, sp 0x7fdffffc4f1
> pc 0x4a11a8, sp 0x7fdffffc5b1
> pc 0x4a15f4, sp 0x7fdffffc6f1
> pc 0x4a5904, sp 0x7fdffffc7b1
> pc 0x4a6930, sp 0x7fdffffc881
> pc 0x49ebb0, sp 0x7fdffffc941
> pc 0x49a4f4, sp 0x7fdffffca01
> pc 0x14fbf8, sp 0x7fdffffcb21
> pc 0x101160, sp 0x7fdffffcc61
> pc 0x10ecf8, sp 0x7fdffffcd21
> pc 0x1144c4, sp 0x7fdffffce31
> pc 0x1a05a8, sp 0x7fdffffcef1
> pc 0x1001d0, sp 0x7fdffffcfb1
> pc 0, sp 0x7fdffffd071
> done
> KDB: stack backtrace:
> KDB: stack backtrace:
> mi_switch() at mi_switch+0x1a8
> sched_preempt() at sched_preempt+0xac
> cpu_ipi_preempt() at cpu_ipi_preempt+0x4
> -- interrupt level=0x6 pil=0x5 %o7=0xc085f414 --
> cpu_ipi_stop() at cpu_ipi_stop+0x114
> -- interrupt level=0x5 pil=0 %o7=0xc085bb84 --
> spinlock_exit() at spinlock_exit+0x28
> _mtx_unlock_spin_flags() at _mtx_unlock_spin_flags+0x16c
> sched_idletd() at sched_idletd+0x258
> fork_exit() at fork_exit+0x9c
> fork_trampoline() at fork_trampoline+0x8
> 
> db> 
> 
> I haven't looked into this further.
> 
> -- 
> Peter Jeremy




More information about the freebsd-sparc64 mailing list