misc/137307: Enhance strptime to support %U and %W

Paul Green Paul.Green at stratus.com
Fri Jul 31 14:20:02 UTC 2009


>Number:         137307
>Category:       misc
>Synopsis:       Enhance strptime to support %U and %W
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          change-request
>Submitter-Id:   current-users
>Arrival-Date:   Fri Jul 31 14:20:01 UTC 2009
>Closed-Date:
>Last-Modified:
>Originator:     Paul Green
>Release:        Head
>Organization:
Stratus Technologies, Inc.
>Environment:
N/A
>Description:
I am enclosing a uni-diff file against strptime.c that adds support for the %n, %t, %U, and %W specifiers.  The diff file also contains two simple test cases that I wrote to test various aspects of the new (and old) strptime code.

This directory contains a modified copy of strptime.c (based on
version 1.35 of the FreeBSD strptime.c).  I wrote these changes
based on my own knowledge and research and I hereby donate them
to the FreeBSD community.  I claim no copyright or other
privileges to these changes.  If you find them useful, that's
great.  If you decide you don't need or want them, well, that's
your decision and I'll understand.

Write me if you have any questions.


I have added support for the following POSIX-2001 features:

1. The skip whitespace directives "%n" and "%t".  The POSIX-2001
   standard says that the %n and %t directives skip any white
   space.

2. The week-of-the-year directives "%U" and "%W".  The existing
   FreeBSD strptime code recognizes both directives and
   validates that the week number lies in the permitted range,
   but then simply discards the value.

   The POSIX-2001 standard is vague on how these directives
   should be handled.  Give the limited knowledge that this
   implementation of strptime has about the nature of the string
   it is parsing (in particular, it has no knowledge of whether
   or not any given field in struct tm has been defined), I
   thought it wise to depend only upon the absolute minimum
   amount of information.  Just as the "%p" directive requires
   that the string contain an preceding hour value, my
   implementation of "%U" and "%W" requires that the string
   contains a preceding year value.  It then calculates the date
   of the first Sunday (for %U) or first Monday (for %W), and
   store the appropriate values in struct tm.

   If the strptime.c program was smarter about which fields have
   previously appeared, one could imagine more complicated
   definitions for the handling of these directives.  I did not
   feel that these directives were worthy of adding a great deal
   more complexity to strptime.c.

3. The algorithm to compute the first weekday of a year comes
   from Wikipedia.  The URL is in the source code.  I simplified
   the formula because I only need to calculate the weekday that
   January 1 falls on.

4. I have enclosed two self-tests.

   The "t_first_wday.c" test performs tests on a range of years
   that includes both leap years and non-leap years.
   Essentially, it confirms that the formula to compute the
   first weekday of a year is correct.

   The "t_strptime.c" test performs basic tests on strptime
   itself.  It ensures that the conversion specifiers are
   implemented.  It performs a test of selected capabilities of
   strptime; it is not meant to be an exhaustive test.


FreeBSD Documentation Update

The existing man page for strptime contains the following text:

     The %U and %W format specifiers accept any value within the
     range 00 to 53 without validating against other values
     supplied (like month or day of the year, for example).

Please replace that text with the following text:

     The %U and %W format specifiers accept a two-digit decimal
     value in the range 0 to 53.  A leading zero is permitted
     but not required.  The tm_year member of 'struct tm' must
     be set, either by using one of the specifiers that sets the
     year, or by initializing the tm_year member before calling
     strptime.  Week 1 refers to the first Sunday (for %U) or
     first Monday (for %W) of the year.  Week 0 is equivalent to
     specifying week 1, and specifying week 53 may result in
     rolling over to a date in the following year, in which case
     the tm_year member is incremented.  The tm_yday, tm_mon,
     tm_mday, and tm_wday members are set to the values for the
     Sunday (for %U) or Monday for %W) of the specified week of
     the given year.

>How-To-Repeat:
Run the enclosed test case on an unmodified copy of strptime. The %U and %W specifiers are recognized but do not assign any of the elements of the tm structure.
>Fix:
See patch file.

Patch attached with submission follows:

diff -urp old/strptime.c new/strptime.c
--- old/strptime.c	2009-07-31 10:10:54.000000000 -0400
+++ new/strptime.c	2009-07-31 10:10:58.000000000 -0400
@@ -76,15 +76,34 @@ static char * _strptime(const char *, co
 
 #define asizeof(a)	(sizeof (a) / sizeof ((a)[0]))
 
+#define leapyear(y)	((y) % 4 == 0 && ((y) % 100 != 0 || (y) % 400 == 0))
+
+/* Calculate the week day of the first day of a year.  Valid for
+   the Gregorian calendar, which began Sept 14, 1752 in the UK
+   and its colonies.  Ref:
+   http://en.wikipedia.org/wiki/Calculating_the_day_of_the_week
+   */
+
+static int
+first_wday_of(int year)
+{
+	return (((2 * (3 - (year / 100) % 4)) + (year % 100) +
+		((year % 100) / 4) + (leapyear(year) ? 6 : 0) + 1) % 7);
+}
+
 static char *
 _strptime(const char *buf, const char *fmt, struct tm *tm, int *GMTp)
 {
 	char	c;
 	const char *ptr;
-	int	i,
+	int	day_offset,
+		i,
 		len;
 	int Ealternative, Oalternative;
 	struct lc_time_T *tptr = __get_current_time_locale();
+	static int start_of_month[2][13] = {
+		{0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334, 365},
+		{0, 31, 60, 91, 121, 152, 182, 213, 244, 274, 305, 335, 366}};
 
 	ptr = fmt;
 	while (*ptr != 0) {
@@ -154,6 +173,12 @@ label:
 			Ealternative++;
 			goto label;
 
+		case 'n':
+		case 't':
+			while (*buf != 0 && isspace ((unsigned char)*buf))
+				buf++;
+			break;
+
 		case 'O':
 			if (Ealternative || Oalternative)
 				break;
@@ -324,12 +349,8 @@ label:
 
 		case 'U':
 		case 'W':
-			/*
-			 * XXX This is bogus, as we can not assume any valid
-			 * information present in the tm structure at this
-			 * point to calculate a real value, so just check the
-			 * range for now.
-			 */
+			/* We expect that the year has already been
+			   parsed.  */
 			if (!isdigit((unsigned char)*buf))
 				return 0;
 
@@ -345,6 +366,40 @@ label:
 			if (*buf != 0 && isspace((unsigned char)*buf))
 				while (*ptr != 0 && !isspace((unsigned char)*ptr))
 					ptr++;
+
+			/* Week numbers are 1-origin.  So that we can always
+			   return the date of a Sunday (or Monday), treat week
+			   0 as week 1. */
+
+			if (i == 0)
+				i = 1;
+
+			if (c == 'U')
+				day_offset = 0;   /* Sunday */
+			else day_offset = 1;   /* Monday */
+
+			/* Set the date to the first Sunday (or Monday)
+			   of the specified week of the year.  */
+
+			tm->tm_yday = (7 - first_wday_of(tm->tm_year+1900) +
+				day_offset) % 7 + (i-1) * 7;
+			i = 0;
+			while (tm->tm_yday >=
+				start_of_month[leapyear(tm->tm_year+1900)][i])
+			{
+				i++;
+			}
+			if (i > 12)
+			{
+				i = 1;
+				tm->tm_yday -=
+					start_of_month[leapyear(tm->tm_year+1900)][12];
+				tm->tm_year++;
+			}
+			tm->tm_mon = i - 1;
+			tm->tm_mday = tm->tm_yday -
+				start_of_month[leapyear(tm->tm_year+1900)][i - 1] + 1;
+			tm->tm_wday = day_offset;
 			break;
 
 		case 'w':
diff -urp old/t_first_wday.c new/t_first_wday.c
--- old/t_first_wday.c	2009-07-31 10:11:12.000000000 -0400
+++ new/t_first_wday.c	2009-07-31 10:11:17.000000000 -0400
@@ -0,0 +1,61 @@
+/* Beginning of modification history */
+/* Written 2009-07-30 by Paul Green. */
+/* End of modification history */
+
+/* Program to test the calculation of the first day of a year.
+   */
+
+#define _POSIX_C_SOURCE 200112L
+#include <stdio.h>
+
+static char *day_name[7] = { "Sunday", "Monday", "Tuesday",
+     "Wednesday", "Thursday", "Friday", "Saturday"};
+
+static int known_wday [12] = {4, 5, 6, 1, 2, 3, 4, 6, 0, 1, 2, 4};
+
+/* Returns 1 if y is a leap year, 0 otherwise.  */
+
+#define leapyear(y) ((y) % 4 == 0 && ((y) % 100 != 0 || (y) % 400 == 0))
+
+/* Calculate the week day of the first day of a year.  Valid for
+   the Gregorian calendar, which began Sept 14, 1752 in the UK
+   and its colonies.  Ref:
+   http://en.wikipedia.org/wiki/Calculating_the_day_of_the_week
+   */
+
+static int first_wday_of(int year)
+{
+     return (
+               (2 * (3 - (year / 100) % 4))
+             + (year % 100)
+             + ((year % 100) / 4)
+             + (leapyear(year) ? 6 : 0)
+             + 1
+            ) % 7;
+}
+
+int main (int argc, char **argv)
+{
+     int wday,year;
+     int errors=0;
+
+     for (year=1998; year<2010; year++)
+     {
+          wday = first_wday_of (year);
+          if (wday != known_wday [year-1998])
+          {
+               printf ("Error calculating first day of %4d.\n", year);
+               printf ("Expected: January %d.\n", known_wday [year-1998]);
+               printf ("Got:      January %d.\n", wday);
+               errors++;
+          }
+          else printf ("The first day of %4d is a %s (%d)\n", 
+                    year, day_name [wday], wday);
+     }
+
+     if (errors)
+          printf ("Test failed.\n");
+     else printf ("Test passed.\n");
+
+     return errors;
+}
diff -urp old/t_strptime.c new/t_strptime.c
--- old/t_strptime.c	2009-07-31 10:11:21.000000000 -0400
+++ new/t_strptime.c	2009-07-31 10:11:25.000000000 -0400
@@ -0,0 +1,349 @@
+/* Beginning of modification history */
+/* Written 09-07-27 by Paul Green. */
+/* End of modification history */
+
+#define _XOPEN_SOURCE 600
+#pragma longmap_check, no_default_mapping, system_programming;
+
+#include <stdio.h>
+#include <stddef.h>
+#include <string.h>
+#include <sys/types.h>
+#include "time.h"
+
+static int failures = 0;
+
+static void explain (char *title, char *fmt, char *src)
+{
+	printf ("Testing:  %s\n", title);
+	printf ("Format:   %s\n", fmt);
+	printf ("Source:   %s\n", src);
+}
+
+static void try1 (char *title, char *fmt, char *src, off_t off, int answer)
+{
+struct tm r;
+int       value;
+
+	explain (title, fmt, src);
+	memset (&r, 0, sizeof (r));
+	strptime (src, fmt, &r);
+
+	value = * (int *) ((char *)&r + off);
+
+	if (value == answer)
+		printf ("Result:   %d (ok)\n", answer);
+	else
+	{
+		printf ("Result:   %d instead of %d (bad)\n", value, answer);
+		failures++;
+	}
+	printf ("\n");
+}
+
+/* Special test for the %c format string.  Try to convert
+   "Mon Mday HH:MM:SS YYYY".  */
+
+static void try2 (char *title, char *fmt, char *src)
+{
+struct tm r;
+
+	explain (title, fmt, src);
+	memset (&r, 0, sizeof (r));
+	strptime (src, fmt, &r);
+
+	if (r.tm_mon == 11 &&
+	    r.tm_mday == 31 &&
+	    r.tm_year == 109 &&
+	    r.tm_hour == 23 &&
+	    r.tm_min == 59 &&
+	    r.tm_sec == 59)
+		printf ("Result:   December 31 23:59:59 2009 (ok)\n");
+	else
+	{
+		failures++;
+		printf ("Result:   failed. %d/%d/%d %02d:%02d:%02d %2d %2d\n",
+			r.tm_mon+1, r.tm_mday, r.tm_year+1900, r.tm_hour,
+			r.tm_min, r.tm_sec, r.tm_wday, r.tm_yday);
+	}
+	printf ("\n");
+}
+
+/* Special test for the %D format string.  Try to convert
+   "MM/DD/YY".  */
+
+static void try3 (char *title, char *fmt, char *src)
+{
+struct tm r;
+
+	explain (title, fmt, src);
+	memset (&r, 0, sizeof (r));
+	strptime (src, fmt, &r);
+
+	if (r.tm_mon == 11 &&
+	    r.tm_mday == 31 &&
+	    r.tm_year == 109)
+		printf ("Result:   12/31/2009 (ok)\n");
+	else
+	{
+		failures++;
+		printf ("Result:   failed. %02d/%02d/%4d\n",
+			r.tm_mon+1, r.tm_mday, r.tm_year+1900);
+	}
+	printf ("\n");
+}
+
+/* Special test for the %r format string.  Try to convert
+   "HH:MM:SS XM".  */
+
+static void try4 (char *title, char *fmt, char *src)
+{
+struct tm r;
+
+	explain (title, fmt, src);
+	memset (&r, 0, sizeof (r));
+	strptime (src, fmt, &r);
+
+	if (r.tm_hour == 23 &&
+	    r.tm_min == 59 &&
+	    r.tm_sec == 59)
+		printf ("Result:   11:59:59 pm (ok)\n");
+	else
+	{
+		failures++;
+		printf ("Result:   failed. %02d:%02d:%02d\n",
+			r.tm_hour, r.tm_min, r.tm_sec);
+	}
+	printf ("\n");
+}
+
+/* Special test for the %R format string.  Try to convert
+   "HH:MM".  */
+
+static void try5 (char *title, char *fmt, char *src)
+{
+struct tm r;
+
+	explain (title, fmt, src);
+	memset (&r, 0, sizeof (r));
+	strptime (src, fmt, &r);
+
+	if (r.tm_hour == 23 &&
+	    r.tm_min == 59)
+		printf ("Result:   23:59 (ok)\n");
+	else
+	{
+		failures++;
+		printf ("Result:   failed. %02d:%02d\n",
+			r.tm_hour, r.tm_min);
+	}
+	printf ("\n");
+}
+
+/* Special test for the %T format string.  Try to convert
+   "HH:MM:SS".  */
+
+static void try6 (char *title, char *fmt, char *src)
+{
+struct tm r;
+
+	explain (title, fmt, src);
+	memset (&r, 0, sizeof (r));
+	strptime (src, fmt, &r);
+
+	if (r.tm_hour == 23 &&
+	    r.tm_min == 59 &&
+	    r.tm_sec == 59)
+		printf ("Result:   23:59:59 (ok)\n");
+	else
+	{
+		failures++;
+		printf ("Result:   failed. %02d:%02d:%02d\n",
+			r.tm_hour, r.tm_min, r.tm_sec);
+	}
+	printf ("\n");
+}
+
+/* Special test for the %U format string.  Try to convert
+   "2009 21", which is 5/31/2009 */
+
+static void try7 (char *title, char *fmt, char *src)
+{
+struct tm r;
+
+	explain (title, fmt, src);
+	memset (&r, 0, sizeof (r));
+	strptime (src, fmt, &r);
+
+	if (r.tm_mon == 4 &&
+	    r.tm_mday == 31 &&
+	    r.tm_year == 109 &&
+	    r.tm_wday == 0 &&
+	    r.tm_yday == 151)
+		printf ("Result:   Sunday May 31 2009, yday 151 [week 21] (ok)\n");
+	else
+	{
+		failures++;
+		printf ("Result:   failed. %02d/%02d/%04d wday=%d yday=%d\n",
+			r.tm_mon+1, r.tm_mday, r.tm_year+1900, r.tm_wday, r.tm_yday);
+		printf ("Expected          05/31/2009 wday=0 yday=151\n");
+	}
+	printf ("\n");
+}
+
+/* Special test for the %W format string.  Try to convert
+   "2009 21", which is 6/1/2009 */
+
+static void try8 (char *title, char *fmt, char *src)
+{
+struct tm r;
+
+	explain (title, fmt, src);
+	memset (&r, 0, sizeof (r));
+	strptime (src, fmt, &r);
+
+	if (r.tm_mon == 5 &&
+	    r.tm_mday == 1 &&
+	    r.tm_year == 109 &&
+	    r.tm_wday == 1 &&
+	    r.tm_yday == 152)
+		printf ("Result:   Monday June 1 2009, yday 152 [week 21] (ok)\n");
+	else
+	{
+		failures++;
+		printf ("Result:   failed. %02d/%02d/%04d wday=%d yday=%d\n",
+			r.tm_mon+1, r.tm_mday, r.tm_year+1900, r.tm_wday, r.tm_yday);
+		printf ("Expected:         06/01/2009 wday=1 yday=152\n");
+	}
+	printf ("\n");
+}
+
+/* Special test for the %U format string.  Try to convert
+   a range of values. */
+
+static void tryU (char *title, char *fmt, char *range)
+{
+/* automatic */
+
+int mon, mday, w, yday, year, yr;
+char *p;
+char src[32];
+struct tm r;
+
+/* static */
+
+/* The date in January of the first Sunday, for 1998-2004. */
+static int first_sunday [7] = {4, 3, 2, 7, 6, 5, 4};
+
+/* The month*100 + the date of Sunday, for 1998-2004. */
+static int known_answer [7][54] = {
+	{0,104,111,118,125,201,208,215,222,301,308,315,322,329,405,412,419,426,503,510,517,524,531,607,614,621,628,705,712,719,726,802,809,816,823,830,906,913,920,927,1004,1011,1018,1025,1101,1108,1115,1122,1129,1206,1213,1220,1227,103},  /* 1998 */
+	{0,103,110,117,124,131,207,214,221,228,307,314,321,328,404,411,418,425,502,509,516,523,530,606,613,620,627,704,711,718,725,801,808,815,822,829,905,912,919,926,1003,1010,1017,1024,1031,1107,1114,1121,1128,1205,1212,1219,1226,102},  /* 1999 */
+	{0,102,109,116,123,130,206,213,220,227,305,312,319,326,402,409,416,423,430,507,514,521,528,604,611,618,625,702,709,716,723,730,806,813,820,827,903,910,917,924,1001,1008,1015,1022,1029,1105,1112,1119,1126,1203,1210,1217,1224,1231}, /* 2000 */
+	{0,107,114,121,128,204,211,218,225,304,311,318,325,401,408,415,422,429,506,513,520,527,603,610,617,624,701,708,715,722,729,805,812,819,826,902,909,916,923,930,1007,1014,1021,1028,1104,1111,1118,1125,1202,1209,1216,1223,1230,106},  /* 2001 */
+	{0,106,113,120,127,203,210,217,224,303,310,317,324,331,407,414,421,428,505,512,519,526,602,609,616,623,630,707,714,721,728,804,811,818,825,901,908,915,922,929,1006,1013,1020,1027,1103,1110,1117,1124,1201,1208,1215,1222,1229,105},  /* 2002 */
+	{0,105,112,119,126,202,209,216,223,302,309,316,323,330,406,413,420,427,504,511,518,525,601,608,615,622,629,706,713,720,727,803,810,817,824,831,907,914,921,928,1005,1012,1019,1026,1102,1109,1116,1123,1130,1207,1214,1221,1228,104},  /* 2003 */
+	{0,104,111,118,125,201,208,215,222,229,307,314,321,328,404,411,418,425,502,509,516,523,530,606,613,620,627,704,711,718,725,801,808,815,822,829,905,912,919,926,1003,1010,1017,1024,1031,1107,1114,1121,1128,1205,1212,1219,1226,102}}; /* 2004 */
+
+/* execution */
+
+	explain (title, fmt, range);
+	memset (&r, 0, sizeof (r));
+
+	for (year=1998; year<2005; year++)
+	{
+		mday = first_sunday[year-1998];
+		yday = mday;
+		printf ("The first Sunday of %4d is Jan %d.\n", year, mday);
+
+		for (w=1; w<54; w++)
+		{
+			sprintf (src, "%4d %d", year, w);
+			p = strptime (src, fmt, &r);
+			mon = (known_answer[year-1998][w] / 100) - 1;
+			mday = (known_answer[year-1998][w] % 100);
+
+			if (w == 53 && mon == 0)
+			{
+				yr = year + 1;
+				yday = mday;
+			}
+			else yr = year;
+
+			if (p == NULL ||
+			    r.tm_mon  != mon ||
+			    r.tm_mday != mday ||
+			    r.tm_year != yr - 1900 ||
+			    r.tm_wday != 0 ||
+			    r.tm_yday != yday)
+			{
+				failures++;
+				printf ("%4d week %d failed.\n", year, w);
+				printf ("   got:      %02d/%02d/%04d wday=%d yday=%d\n",
+					r.tm_mon+1, r.tm_mday, r.tm_year+1900, r.tm_wday, r.tm_yday);
+				printf ("   expected: %02d/%02d/%04d wday=%d yday=%d\n",
+					mon+1, mday, yr, 0, yday);
+				printf ("\n");
+			}
+			yday += 7;
+		}
+	}
+	printf ("\n");
+}
+
+/* Special test for the %W format string.  */
+
+int main (int argc, char **argv)
+{
+struct tm t;
+
+     t.tm_sec = 59;
+     t.tm_min = 59;
+     t.tm_hour = 23;
+     t.tm_mday = 31;
+     t.tm_mon = 12;
+     t.tm_year = 2009;
+     t.tm_wday = 6;
+     t.tm_yday = 364;
+     t.tm_isdst = 0;
+
+     try1 ("Day of the week", "%a", "Sat",      offsetof (struct tm, tm_wday), 6);
+     try1 ("Day of the week", "%A", "Saturday", offsetof (struct tm, tm_wday), 6);
+     try1 ("Month name",      "%b", "Dec",      offsetof (struct tm, tm_mon), 11);
+     try1 ("Month name",      "%B", "December", offsetof (struct tm, tm_mon), 11);
+     try2 ("Date and time",   "%c", "December 31 23:59:59 2009");
+     try1 ("Century",         "%C", "2009",     offsetof (struct tm, tm_year), 100);
+     try1 ("Date of month",   "%d", "31",       offsetof (struct tm, tm_mday), 31);
+     try3 ("MM/DD/YY",        "%D", "12/31/09");
+     /* skip E codes */
+     /* skip O codes */
+     try1 ("Date of month",   "%e", "31",       offsetof (struct tm, tm_mday), 31);
+     try1 ("Month name",      "%h", "Dec",      offsetof (struct tm, tm_mon), 11);
+     try1 ("Hours, 24-hr",    "%H", "23",       offsetof (struct tm, tm_hour), 23);
+     try1 ("Hours, 12-hr",    "%I", "12",       offsetof (struct tm, tm_hour), 12);
+     try1 ("Day of year",     "%j", "365",      offsetof (struct tm, tm_yday), 364);
+     try1 ("Month number",    "%m", "12",       offsetof (struct tm, tm_mon), 11);
+     try1 ("Minute number",   "%M", "59",       offsetof (struct tm, tm_min), 59);
+     try1 ("AM/PM 12-hr",     "%I %p", "11 AM", offsetof (struct tm, tm_hour), 11);
+     try1 ("AM/PM 12-hr",     "%I %p", "11 PM", offsetof (struct tm, tm_hour), 23);
+     try1 ("AM/PM 24-hr",     "%H", "11",       offsetof (struct tm, tm_hour), 11);
+     try1 ("AM/PM 24-hr",     "%H", "23",       offsetof (struct tm, tm_hour), 23);
+     try4 ("Time of day H:M:S XM", "%r", "11:59:59 pm");
+     try5 ("Time of day H:M", "%R", "23:59");
+     try1 ("Second number",   "%S", "59",       offsetof (struct tm, tm_sec), 59);
+     try6 ("Time of day H:M:S", "%T", "23:59:59");
+     try7 ("Week number Sun=0", "%Y %U", "2009 22");
+     try1 ("Week day number", "%w", "4",        offsetof (struct tm, tm_wday), 4);
+     try8 ("Week number Mon=0", "%Y %W", "2009 22");
+     try1 ("Year in century",  "%y",  "9",      offsetof (struct tm, tm_year), 109);
+     try1 ("Year in century",  "%Y",  "2009",   offsetof (struct tm, tm_year), 109);
+
+     tryU ("U conversion", "%Y %U", "range");
+
+     if (failures)
+          printf ("%d tests failed.\n", failures);
+     else printf ("All tests passed.\n");
+
+     return failures;
+}
+


>Release-Note:
>Audit-Trail:
>Unformatted:


More information about the freebsd-bugs mailing list