svn commit: r269466 - in head/sys/cddl: compat/opensolaris/sys contrib/opensolaris/uts/common/fs/zfs

Sun Aug 3 17:52:27 UTC 2014

On Sun, 3 Aug 2014, Xin LI wrote:

> Log:
>  Revert r269404 and use cpu_ticks() for dbuf allocation.
>
>  Encode CPU's number by XOR'ing the CPU ID against the 64-bit cpu_ticks().
>
>  Reviewed by:	mav, gibbs
>  Differential Revision: https://phabric.freebsd.org/D521
>  MFC after:	2 weeks

This is still broken.

> Modified: head/sys/cddl/compat/opensolaris/sys/time.h
> ==============================================================================
> --- head/sys/cddl/compat/opensolaris/sys/time.h	Sun Aug  3 09:40:50 2014	(r269465)
> +++ head/sys/cddl/compat/opensolaris/sys/time.h	Sun Aug  3 09:47:51 2014	(r269466)
> @@ -60,17 +60,6 @@ gethrtime(void) {
> 	struct timespec ts;
> 	hrtime_t nsec;
>
> -	nanouptime(&ts);
> -	nsec = (hrtime_t)ts.tv_sec * NANOSEC + ts.tv_nsec;
> -	return (nsec);
> -}

The in-between version wasn't broken.  nanouptime() is not too slow to
use provided it is properly implemented.  Unfortunately, this includes
proper implementation in the hardware.

> -
> -static __inline hrtime_t
> -gethrtime_waitfree(void) {
> -
> -	struct timespec ts;
> -	hrtime_t nsec;
> -
> 	getnanouptime(&ts);
> 	nsec = (hrtime_t)ts.tv_sec * NANOSEC + ts.tv_nsec;
> 	return (nsec);

The existence of getnanouptime() is a bug.  Someone might use it.

> @@ -78,6 +67,7 @@ gethrtime_waitfree(void) {
>
> #define	gethrestime_sec()	(time_second)
> #define	gethrestime(ts)		getnanotime(ts)
> +#define	gethrtime_waitfree()	gethrtime()
>
> extern int nsec_per_tick;	/* nanoseconds per clock tick */
>
>
> Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c
> ==============================================================================
> --- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c	Sun Aug  3 09:40:50 2014	(r269465)
> +++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dbuf.c	Sun Aug  3 09:47:51 2014	(r269466)
> @@ -70,7 +70,11 @@ dbuf_cons(void *vdb, void *unused, int k
> 	cv_init(&db->db_changed, NULL, CV_DEFAULT, NULL);
> 	refcount_create(&db->db_holds);
>
> +#if defined(illumos) || !defined(_KERNEL)
> 	db->db_creation = gethrtime();
> +#else
> +	db->db_creation = cpu_ticks() ^ ((uint64_t)CPU_SEQID << 48);
> +#endif
>
> 	return (0);
> }

cpu_ticks() is a hack to work around nanouptime() being too slow with
some hardware.  It is much harder to use than nanouptime().  There are
no correct uses of it, including its main use.  Frequency changes are
hard to handle correctly.

Here the bug is that it wraps after just 65536+ seconds (18+ hours) at
4GHz, even on x86 where ticks have 64 bits.  This gives non-uniqeness in
another way.  The wrap is increased by wastefully supporting 65536 CPUs.
This leaves only 48 bits before wrap occurs (the high bits are XORed.
This helps for uniqueness but doesn't guarantee it.  E.g., the value of
2**48 on CPU0 is indistingishable from the value of 0 on CPU1.  These
values occur just 65536+ seconds apart at 4GHz).  There is likely to be
noise in the lower bits, but this doesn't guarantee uniqueness.  On other
arches, ticks may have many fewer than 64 bits.  Their main use assumes
that they have enough bits to represent uptimes of several years, so the
above probably has no additional problems.  Arches without a TSC mostly
use nanouptime() for ticks anyway.  These tend to be the slowest arches.

gibbs' original suggestion of:

  	db->db_creation = pcup_counter++ ^ ((uint64_t)CPU_SEQID << 48);

probably works.  Now the counter doesn't race off to infinity at 4GHz
so it is obviously unique for 2**48 creations per CPU.  It increases in
a different unscaled, often nonlinear timescale with much longer ticks.
This is also much faster, especially on slower arches without a TSC that
emulate cpu_ticks() using nanoptime().

Bruce