svn commit: r326073 - head/usr.bin/systat

Wed Nov 22 10:39:25 UTC 2017

On Wed, Nov 22, 2017 at 08:59:21AM +1100, Bruce Evans wrote:
> On Tue, 21 Nov 2017, Konstantin Belousov wrote:
> 
> > Log:
> >  systat: use and correctly display 64bit counters.
> >
> >  Following struct vmtotal changes, make systat use and correctly
> >  display 64-bit counters.  Switch to humanize_number(3) to overcome
> >  homegrown arithmetics limits in pretty printing large numbers.  Use
> >  1024 as a divisor for memory fields to make it consistent with other
> >  tools and users expectations.
> 
> I don't like dehumanize_number(), and it can't handle most cases in
> systat -v since most cases use floating point.
> 
> Using unsigned types gives sign extension bugs as usual.  In old versions
> version, large unsigned numbers (only their lower 32 bits) were not
> unintentionally printed in signed format to get an early warning about
> overflow when they reach half-way to 32-bit overflow.  This is no longer
> useful, but signed format is still used.  There are related sign extension
> bugs for non-64-bit fields that allow the early warning to still work.
> 
> > Modified: head/usr.bin/systat/vmstat.c
> > ==============================================================================
> > --- head/usr.bin/systat/vmstat.c	Tue Nov 21 19:23:20 2017	(r326072)
> > +++ head/usr.bin/systat/vmstat.c	Tue Nov 21 19:55:32 2017	(r326073)
> > @@ -138,6 +140,8 @@ static float cputime(int);
> > static void dinfo(int, int, struct statinfo *, struct statinfo *);
> > static void getinfo(struct Info *);
> > static void putint(int, int, int, int);
> > +static void putuint64(uint64_t, int, int, int);
> > +static void do_putuint64(uint64_t, int, int, int, int);
> 
> Style bug (unsorting).
> 
> > static void putfloat(double, int, int, int, int, int);
> > static void putlongdouble(long double, int, int, int, int, int);
> > static int ucount(void);
> > @@ -491,15 +495,15 @@ showkre(void)
> > 	putfloat(100.0 * s.v_kmem_map_size / kmem_size,
> > 	   STATROW + 1, STATCOL + 22, 2, 0, 1);
> >
> > -	putint(pgtokb(total.t_arm), MEMROW + 2, MEMCOL + 4, 7);
> > -	putint(pgtokb(total.t_armshr), MEMROW + 2, MEMCOL + 12, 7);
> > -	putint(pgtokb(total.t_avm), MEMROW + 2, MEMCOL + 20, 8);
> > -	putint(pgtokb(total.t_avmshr), MEMROW + 2, MEMCOL + 29, 8);
> > -	putint(pgtokb(total.t_rm), MEMROW + 3, MEMCOL + 4, 7);
> > -	putint(pgtokb(total.t_rmshr), MEMROW + 3, MEMCOL + 12, 7);
> > -	putint(pgtokb(total.t_vm), MEMROW + 3, MEMCOL + 20, 8);
> > -	putint(pgtokb(total.t_vmshr), MEMROW + 3, MEMCOL + 29, 8);
> > -	putint(pgtokb(total.t_free), MEMROW + 2, MEMCOL + 38, 7);
> > +	putuint64(pgtokb(total.t_arm), MEMROW + 2, MEMCOL + 4, 7);
> > +	putuint64(pgtokb(total.t_armshr), MEMROW + 2, MEMCOL + 12, 7);
> > +	putuint64(pgtokb(total.t_avm), MEMROW + 2, MEMCOL + 20, 8);
> > +	putuint64(pgtokb(total.t_avmshr), MEMROW + 2, MEMCOL + 29, 8);
> > +	putuint64(pgtokb(total.t_rm), MEMROW + 3, MEMCOL + 4, 7);
> > +	putuint64(pgtokb(total.t_rmshr), MEMROW + 3, MEMCOL + 12, 7);
> > +	putuint64(pgtokb(total.t_vm), MEMROW + 3, MEMCOL + 20, 8);
> > +	putuint64(pgtokb(total.t_vmshr), MEMROW + 3, MEMCOL + 29, 8);
> > +	putuint64(pgtokb(total.t_free), MEMROW + 2, MEMCOL + 38, 7);
> 
> I see that a very recent expansion from int32_t to uint64_t didn't work
> here.
Can you explain, please ?

> 
> > 	putint(total.t_rq - 1, PROCSROW + 2, PROCSCOL, 3);
> > 	putint(total.t_pw, PROCSROW + 2, PROCSCOL + 4, 3);
> > 	putint(total.t_dw, PROCSROW + 2, PROCSCOL + 8, 3);
> 
> This has larger sign extension bugs than before.  All the fields here are
> still int16_t.  putint() still takes int args, and so the int16_t's int
> converted to int.  The used to be printed as int, but now they are converted
> to uint64_t.  They shouldn't be negative, but if they are then the were printed
> as negative.  Now the conversion to uint64_t has sign-extension bugs/overflows
> for negative values.  Negative values still be printed as negative via further
> overflows, but if the values are passed to dehumanize_number(), they are
> printed as enormous unsigned values.
I do not see a point in expanding the process counters to uint16_t. I
might do it if somebody has a realistic load with 30K processes in a
system.

Having 100K threads created simultaneously is quite affordable, so the
change could be useful one day.

> 
> Printing everything as 64 bits is a pessimization.  It would probably work
> to convert everything to float and use only putfloat().  This gives about
> 8 digits of accuracy and even that is often too many.  This was not done
> mainly because floating point was slower than integers.  Now it might
> be faster than uint64_t.
> 
> Flots could also be converted to integers except for printing the percentage
> and not much more.  A special case for the percentage would be easy to write
> and is already partly there to avoid printing 100.0 which is too wide.  This
> was not done mainly because 32-bit ints were too small.
> 
> > @@ -518,13 +522,13 @@ showkre(void)
> > 	PUTRATE(v_pdwakeups, VMSTATROW + 9, VMSTATCOL, 8);
> > 	PUTRATE(v_pdpages, VMSTATROW + 10, VMSTATCOL, 8);
> > 	PUTRATE(v_intrans, VMSTATROW + 11, VMSTATCOL, 8);
> > -	putint(pgtokb(s.v_wire_count), VMSTATROW + 12, VMSTATCOL, 8);
> > -	putint(pgtokb(s.v_active_count), VMSTATROW + 13, VMSTATCOL, 8);
> > -	putint(pgtokb(s.v_inactive_count), VMSTATROW + 14, VMSTATCOL, 8);
> > -	putint(pgtokb(s.v_laundry_count), VMSTATROW + 15, VMSTATCOL, 8);
> > -	putint(pgtokb(s.v_free_count), VMSTATROW + 16, VMSTATCOL, 8);
> > +	putuint64(pgtokb(s.v_wire_count), VMSTATROW + 12, VMSTATCOL, 8);
> > +	putuint64(pgtokb(s.v_active_count), VMSTATROW + 13, VMSTATCOL, 8);
> > +	putuint64(pgtokb(s.v_inactive_count), VMSTATROW + 14, VMSTATCOL, 8);
> > +	putuint64(pgtokb(s.v_laundry_count), VMSTATROW + 15, VMSTATCOL, 8);
> > +	putuint64(pgtokb(s.v_free_count), VMSTATROW + 16, VMSTATCOL, 8);
> 
> This is bogus.  The fields still have type u_int.  pgtokb() expands by a
> factor of 4 or 8, and putuint64() can handle the expansion, but pgtokb()
> still overflows before the value can be passed.
These fields come from different structure (vmtotal vs. vmmeter), and
surprisingly vmmeter is not part of the ABI.  Expanding the type of
v_free_count is easy.  Having the changes in vmstat for vmmeter fields
done together with vmtotal expansion is reasonable.  We targeted all
memory reporting.

> 
> Overflow occurs at 1K * 4G = 4T.  I think that much memory costs about
> $100000 so no one here has it.
I had the access to a machine with 2T of memory year ago.

> 
> > 	if (LINES - 1 > VMSTATROW + 17)
> > -		putint(s.bufspace / 1024, VMSTATROW + 17, VMSTATCOL, 8);
> > +		putuint64(s.bufspace / 1024, VMSTATROW + 17, VMSTATCOL, 8);
> 
> s.bufspace has the bogus type long, so in 32-bit systems overflow occurs
> with only 2T of bufspace, and on 64-bit systems the expansion can overflow
> even uint64_t, but the world doesn't have that much memory (2T * 2**32).
So practically overflow cannot occur.  I do not quite see the point of the
comment.

> 
> > 	PUTRATE(v_vnodein, PAGEROW + 2, PAGECOL + 6, 5);
> > 	PUTRATE(v_vnodeout, PAGEROW + 2, PAGECOL + 12, 5);
> > 	PUTRATE(v_swapin, PAGEROW + 2, PAGECOL + 19, 5);
> > @@ -666,8 +670,23 @@ cputime(int indx)
> > static void
> > putint(int n, int l, int lc, int w)
> > {
> > +
> > +	do_putuint64(n, l, lc, w, SI);
> 
> Sign extension busg for promoting int to uint64_t.
> 
> > +}
> > +
> > +static void
> > +putuint64(uint64_t n, int l, int lc, int w)
> > +{
> > +
> > +	do_putuint64(n, l, lc, w, IEC);
> > +}
> 
> The divisor is a bit too specialized (IEC is forced for all uint64_t)
> 
> > +
> > +static void
> > +do_putuint64(uint64_t n, int l, int lc, int w, int div)
> > +{
> > 	int snr;
> > 	char b[128];
> > +	char buf[128];
> >
> > 	move(l, lc);
> > #ifdef DEBUG
> > @@ -680,11 +699,12 @@ putint(int n, int l, int lc, int w)
> > 			addch(' ');
> > 		return;
> > 	}
> > -	snr = snprintf(b, sizeof(b), "%*d", w, n);
> > -	if (snr != w)
> > -		snr = snprintf(b, sizeof(b), "%*dk", w - 1, n / 1000);
> > -	if (snr != w)
> > -		snr = snprintf(b, sizeof(b), "%*dM", w - 1, n / 1000000);
> 
> The signed format used to match the signed arg.  The early warning hack
> mostly operated earlier -- u_int args sometimes overflowed to pass them
> here as ints.  The overflow gave had implementation-defined behaviour
> earlier but the behaviour here is defined.
> 
> > +	snr = snprintf(b, sizeof(b), "%*jd", w, (uintmax_t)n);
> 
> The signed format no longer matches the unsigned arg.  It still gives the
> early warning hack via overflow earlier in most cases.  The behaviour here
> is undefined iff (uintmax_t)n exceeds INTMAX_MAX.
Thanks, see the patch at the end.

> 
> Conversion to uintmax_t breaks the hack on unsupported arches when uintmax_t
> is larger than uint64_t.  Then if we start with signed -1, it becomes
> 0xffffffffffffffff as a uint64_t and expanding that loses the original
> sign bit in a non-recoverable way.
> 
> > +	if (snr != w) {
> > +		humanize_number(buf, w, n, "", HN_AUTOSCALE,
> > +		    HN_NOSPACE | HN_DECIMAL | div);
> 
> If this case is reached, then it loses the sign bit in another
> non-recoverable way (by not accidentally recovering it).
> 
> > +		snr = snprintf(b, sizeof(b), "%*s", w, buf);
> > +	}
> > 	if (snr != w) {
> 
> This case is almost (?) unreachable now.  It only occurs if b < 3 or
> if dehumanize_number() doesn't have enough suffixes to reach 
> UINT64_MAX.  Otherwise, it can always print 0 or 1 followed by the
> largest suffix and a NUL.  It seems to be 1 suffix short.  The largest
> prefix is E (exa), and UINT64_MAX is 18.4E which should be printed as 
> 18E, but that takes b >= 4.
> 
> > 		while (w-- > 0)
> > 			addch('*');
> 
> This returns a field full of stars if the value doesn't fit.  I like
> this for showing fields with preposterous values more clearly than
> something like 18E.

diff --git a/usr.bin/systat/vmstat.c b/usr.bin/systat/vmstat.c
index 736cacda061..149efb46760 100644
--- a/usr.bin/systat/vmstat.c
+++ b/usr.bin/systat/vmstat.c
@@ -138,10 +138,10 @@ static void allocinfo(struct Info *);
 static void copyinfo(struct Info *, struct Info *);
 static float cputime(int);
 static void dinfo(int, int, struct statinfo *, struct statinfo *);
+static void do_putuint64(uint64_t, int, int, int, int);
 static void getinfo(struct Info *);
 static void putint(int, int, int, int);
 static void putuint64(uint64_t, int, int, int);
-static void do_putuint64(uint64_t, int, int, int, int);
 static void putfloat(double, int, int, int, int, int);
 static void putlongdouble(long double, int, int, int, int, int);
 static int ucount(void);
@@ -699,7 +699,7 @@ do_putuint64(uint64_t n, int l, int lc, int w, int div)
 			addch(' ');
 		return;
 	}
-	snr = snprintf(b, sizeof(b), "%*jd", w, (uintmax_t)n);
+	snr = snprintf(b, sizeof(b), "%*ju", w, (uintmax_t)n);
 	if (snr != w) {
 		humanize_number(buf, w, n, "", HN_AUTOSCALE,
 		    HN_NOSPACE | HN_DECIMAL | div);