git: 1eb402e47af3 - main - stats(3): Improve t-digest merging of samples which result in mu adjustment underflow.

Lawrence Stewart lstewart at FreeBSD.org
Fri Apr 2 02:18:15 UTC 2021


The branch main has been updated by lstewart:

URL: https://cgit.FreeBSD.org/src/commit/?id=1eb402e47af35b3980e6bd51ec462de3a3faa2c8

commit 1eb402e47af35b3980e6bd51ec462de3a3faa2c8
Author:     Lawrence Stewart <lstewart at FreeBSD.org>
AuthorDate: 2021-04-02 01:29:29 +0000
Commit:     Lawrence Stewart <lstewart at FreeBSD.org>
CommitDate: 2021-04-02 02:17:53 +0000

    stats(3): Improve t-digest merging of samples which result in mu adjustment underflow.
    
    Allow the calculation of the mu adjustment factor to underflow instead of
    rejecting the VOI sample from the digest and logging an error. This trades off
    some (currently unquantified) additional centroid error in exchange for better
    fidelity of the distribution's density, which is the right trade off at the
    moment until follow up work to better handle and track accumulated error can be
    undertaken.
    
    Obtained from:  Netflix
    MFC after:      immediately
---
 sys/kern/subr_stats.c | 38 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/sys/kern/subr_stats.c b/sys/kern/subr_stats.c
index 9dd874fcbcf8..946999263898 100644
--- a/sys/kern/subr_stats.c
+++ b/sys/kern/subr_stats.c
@@ -3255,9 +3255,41 @@ stats_v1_vsd_tdgst_add(enum vsd_dtype vs_dtype, struct voistatdata_tdgst *tdgst,
 		if (is32bit) {
 			ctd32 = (struct voistatdata_tdgstctd32 *)closest;
 			error = Q_QSUBQ(&x, ctd32->mu);
+			/*
+			 * The following calculation "x / (cnt + weight)"
+			 * computes the amount by which to adjust the centroid's
+			 * mu value in order to merge in the VOI sample.
+			 *
+			 * It can underflow (Q_QDIVI() returns ERANGE) when the
+			 * user centroids' fractional precision (which is
+			 * inherited by 'x') is too low to represent the result.
+			 *
+			 * A sophisticated approach to dealing with this issue
+			 * would minimise accumulation of error by tracking
+			 * underflow per centroid and making an adjustment when
+			 * a LSB's worth of underflow has accumulated.
+			 *
+			 * A simpler approach is to let the result underflow
+			 * i.e. merge the VOI sample into the centroid without
+			 * adjusting the centroid's mu, and rely on the user to
+			 * specify their t-digest with sufficient centroid
+			 * fractional precision such that the accumulation of
+			 * error from multiple underflows is of no material
+			 * consequence to the centroid's final value of mu.
+			 *
+			 * For the moment, the latter approach is employed by
+			 * simply ignoring ERANGE here.
+			 *
+			 * XXXLAS: Per-centroid underflow tracking is likely too
+			 * onerous, but it probably makes sense to accumulate a
+			 * single underflow error variable across all centroids
+			 * and report it as part of the digest to provide
+			 * additional visibility into the digest's fidelity.
+			 */
 			error = error ? error :
 			    Q_QDIVI(&x, ctd32->cnt + weight);
-			if (error || (error = Q_QADDQ(&ctd32->mu, x))) {
+			if ((error && error != ERANGE)
+			    || (error = Q_QADDQ(&ctd32->mu, x))) {
 #ifdef DIAGNOSTIC
 				KASSERT(!error, ("%s: unexpected error %d",
 				    __func__, error));
@@ -3276,7 +3308,9 @@ stats_v1_vsd_tdgst_add(enum vsd_dtype vs_dtype, struct voistatdata_tdgst *tdgst,
 			error = Q_QSUBQ(&x, ctd64->mu);
 			error = error ? error :
 			    Q_QDIVI(&x, ctd64->cnt + weight);
-			if (error || (error = Q_QADDQ(&ctd64->mu, x))) {
+			/* Refer to is32bit ERANGE discussion above. */
+			if ((error && error != ERANGE)
+			    || (error = Q_QADDQ(&ctd64->mu, x))) {
 				KASSERT(!error, ("%s: unexpected error %d",
 				    __func__, error));
 				return (error);


More information about the dev-commits-src-all mailing list