[Bug 273245] textproc/groff: groff_mdoc(7): output from 'man 7 groff_mdoc' is badly broken

From: <bugzilla-noreply_at_freebsd.org>
Date: Mon, 04 Sep 2023 22:07:38 UTC
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=273245

--- Comment #6 from G. Branden Robinson <g.branden.robinson@gmail.com> ---
Hi Wolfgang,

I cloned the freebsd-src repository to have a look at the 35 cases within it
that concerned you.

> for i in freebsd-{src,ports,doc};do (cd $i && printf "$i "; git grep 'roff.* -man[^d]' |wc -l );done
> freebsd-src 35

$ git grep -n 'roff.* -man[^d]'
contrib/byacc/aclocal.m4:1047:${NROFF_NOTE}     [\$](SHELL) -c "tbl [\$]*.$2 |
nroff -man | col -bx" >[\$]@
contrib/byacc/aclocal.m4:1053:${GROFF_NOTE}     [\$](SHELL) -c "tbl [\$]*.$2 |
groff -man" >[\$]@
contrib/byacc/aclocal.m4:1056:${GROFF_NOTE}     GROFF_NO_SGR=stupid [\$](SHELL)
-c "tbl [\$]*.$2 | nroff -rHY=0 -Tascii -man | col -bx" >[\$]@
contrib/byacc/configure:8368:${NROFF_NOTE}      \$(SHELL) -c "tbl \$*.1 | nroff
-man | col -bx" >\$@
contrib/byacc/configure:8374:${GROFF_NOTE}      \$(SHELL) -c "tbl \$*.1 | groff
-man" >\$@
contrib/byacc/configure:8377:${GROFF_NOTE}      GROFF_NO_SGR=stupid \$(SHELL)
-c "tbl \$*.1 | nroff -rHY=0 -Tascii -man | col -bx" >\$@

byacc is maintained by Thomas Dickey.  He's using Autoconf macros to produce
output from sources that are known to be in man(7) format.

https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/yacc.1#L30

`CF_MAKE_DOCS` appears to be an Autoconf macro private to the byacc
distribution; I see no other occurrences in `freebsd-src`.  He also has a
Autoconf test to determine how to generate HTML from man(7) pages.

https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/aclocal.m4#L1580

Note in particular the here document at line 1694.

https://github.com/freebsd/freebsd-src/blob/main/contrib/byacc/aclocal.m4#L1694

contrib/dialog/makefile.in:145:@NROFF_NOTE@     GROFF_NO_SGR=stupid $(SHELL) -c
"tbl $< | nroff -rHY=0 -Tascii -man | col -bx" >$@
contrib/dialog/makefile.in:151:@GROFF_NOTE@     $(SHELL) -c "tbl $< | groff
-man" >$@

Dialog is another Thomas Dickey project.

It also builds inputs it knows to be in man(7) format.

$ grep -nrFw .TH contrib/dialogcontrib/dialog/dialog.3:50:.TH \*D 3 "" "$Date:
2021/01/17 18:02:44 $"
contrib/dialog/dialog.1:51:.TH \*D 1 "" "$Date: 2021/01/17 17:25:01 $"
contrib/dialog/configure:8028:.TH HEAD1 HEAD2 HEAD3 HEAD4 HEAD5
contrib/dialog/aclocal.m4:5919:.TH HEAD1 HEAD2 HEAD3 HEAD4 HEAD5
$ grep -nrFw .Dd contrib/dialog || echo NONE # look for mdoc(7) documents
NONE

Next...

contrib/ee/ee.1:5:.\"    nroff -man ee.1

$ head contrib/ee/ee.1 
.\"
.\"
.\"  To format this reference page, use the command:
.\"
.\"    nroff -man ee.1
.\"
.\"  $Header: /home/hugh/sources/old_ae/RCS/ee.1,v 1.22 2001/12/16 04:49:27
hugh Exp $
.\"
.\"
.TH ee 1 "" "" ""

The man page is telling us explicitly (in a comment) what macro package to use
to format it, and unsurprisingly getting it right.

Next...

contrib/ldns/makewin.sh:243:for x in man1/*.1; do groff -man -Tascii -Z "$x" |
grotty -cbu > cat1/"$(basename "$x" .1).txt"; done
contrib/ldns/makewin.sh:246:for x in man3/*.3; do groff -man -Tascii -Z "$x" |
grotty -cbu > cat3/"$(basename "$x" .3).txt"; done

Again we have renderings of known documents.  Let's see what package they use.

$ find contrib/ldns -name "*.[13]" | xargs grep -nEw '\.(Dd|TH)'
contrib/ldns/drill/drill.1:2:.TH drill 1 "28 May 2006"
contrib/ldns/packaging/ldns-config.1:1:.TH ldns-config 1 "22 Sep 2011"

So that's two more correct uses of '-man'.

Next...

contrib/ncurses/aclocal.m4:5607:                nroff -man \$TMP >\$TMP.out
contrib/ncurses/configure:14517:                nroff -man \$TMP >\$TMP.out

Another Thomas Dickey project.  These come from his Autoconf macro
`CF_MAN_PAGES`.  I'll skip ahead here and note that I'm familiar with the
ncurses man pages, having recently proposed patches to them. 
https://lists.gnu.org/archive/html/bug-ncurses/2023-09/

They are exclusively in man(7) format, not mdoc(7).

Here again we have a case of a maintainer knowing what format is required, and
using it.

Next...

contrib/tcp_wrappers/Banners.Makefile:12:# sequences as described in the
hosts_access.5 manual page (`nroff -man'
contrib/tcp_wrappers/CHANGES:2:configuration checker. See the `tcpdchk.8'
manual page (`nroff -man'
contrib/tcp_wrappers/CHANGES:349:have all rules within a single file. See
"nroff -man hosts_options.5"
contrib/tcp_wrappers/Makefile:575:# and hosts_options.5 manual pages (`nroff
-man' format).
contrib/tcp_wrappers/README:240:hosts_access.5 manual page, which is in `nroff
-man' format. A later
contrib/tcp_wrappers/README:257:The hosts_options.5 manual page (`nroff -man'
format) documents an
contrib/tcp_wrappers/README:395:documented in the hosts_options.5 document,
which is in `nroff -man'
contrib/tcp_wrappers/README:432:`nroff -man' format) can guide the requests to
the right server.  These
contrib/tcp_wrappers/README:453:given in the hosts_options.5 manual page
(`nroff -man' format). An
contrib/tcp_wrappers/README:897:hosts_access.5, which is in `nroff -man'
format. This is a lengthy
contrib/tcp_wrappers/README:904:The examples in the hosts_access.5 document
(`nroff -man' format) show
contrib/tcp_wrappers/README:912:hosts_options.5 document (`nroff -man' format).
contrib/tcp_wrappers/README:918:program is described in the tcpdchk.8 document
(`nroff -man' format).
contrib/tcp_wrappers/README:929:described in the tcpdmatch.8 document (`nroff
-man' format).
contrib/tcp_wrappers/README:967:programs.  The hosts_access.3 manual page
(`nroff -man' format)
contrib/tcp_wrappers/options.c:4:  * manual page (source file: hosts_options.5,
"nroff -man" format).

These are all source comments or text file contents, and do not drive
construction of anything; they therefore cannot cause failures.  Nevertheless,
let us see what macro package is employed by "tcp_wrappers".

$ find contrib/tcp_wrappers -name "*.[1-9]" | xargs grep -nEw '\.(Dd|TH)'
contrib/tcp_wrappers/hosts_options.5:1:.TH HOSTS_OPTIONS 5
contrib/tcp_wrappers/tcpdmatch.8:1:.TH TCPDMATCH 8
contrib/tcp_wrappers/tcpd.8:1:.TH TCPD 8
contrib/tcp_wrappers/tcpdchk.8:1:.TH TCPDCHK 8
contrib/tcp_wrappers/hosts_access.5:1:.TH HOSTS_ACCESS 5
contrib/tcp_wrappers/hosts_access.3:1:.TH HOSTS_ACCESS 3

It would appear once again that the upstream maintainer is familiar with their
own man pages.

Next...

contrib/tcsh/tcsh.man2html:13:# in the exact same style of nroff -man, i.e. any
other manpage.

This is a comment.  Some context might be helpful.

$ git grep -C2 -n 'roff.* -man[^d]' contrib/tcsh/
contrib/tcsh/tcsh.man2html-11-#
contrib/tcsh/tcsh.man2html-12-# Designed for tcsh manpage. Guaranteed not to
work on manpages not written
contrib/tcsh/tcsh.man2html:13:# in the exact same style of nroff -man, i.e. any
other manpage.
contrib/tcsh/tcsh.man2html-14-#
contrib/tcsh/tcsh.man2html-15-# Makes links FROM items which are both a) in
particular sections (see

Given that context "guaranteed *not* to work on [other man pages]", it does not
seem fair to hold this source remark as evidence militating against groff's
change.

Next...

contrib/tzcode/workman.sh:18:.rm }F" | nroff -man - ${1+"$@"} | perl -ne '

More context is warranted here, too.

$ head -n 18 contrib/tzcode/workman.sh
#! /bin/sh
# Convert manual page troff stdin to formatted .txt stdout.

# This file is in the public domain, so clarified as of
# 2009-05-17 by Arthur David Olson.

if (type nroff && type perl) >/dev/null 2>&1; then

  # Tell groff not to emit SGR escape sequences (ANSI color escapes).
  GROFF_NO_SGR=1
  export GROFF_NO_SGR

  echo ".am TH
.hy 0
.na
..
.rm }H
.rm }F" | nroff -man - ${1+"$@"} | perl -ne '

Your concern might, at first glance, seem warranted here; the tool does purport
to be of general use.  However, closer inspection reveals that it was written
in ignorance or deliberate neglect of the mdoc(7) package altogether; observe
how it appends to the 'TH' macro, which is unused in mdoc(7).

It would be straightforward to make this script handle mdoc(7) as well; simply
append the same two requests to the `Dd` macro.  The removals of '}H' and '}F'
strings/macros/diversions suggests a familiarity with the AT&T Unix man(7)
implementation or its descendants in USG/System III/System V proprietary Unix
or BSD prior to Networking Release/2 (when the Berkeley CSRG replaced Unix
troff with groff). 

Further, if nroff or perl programs (or shell functions) are unavailable, this
shell script proceeds to use mandoc anwyay.

$ tail -n 6 contrib/tzcode/workman.sh
elif (type mandoc && type col) >/dev/null 2>&1; then
  mandoc -man -T ascii "$@" | col -bx
else
  echo >&2 "$0: please install nroff and perl, or mandoc and col"
  exit 1
fi

Next...

usr.bin/man/man.conf.5:116:NROFF_JA     /usr/local/bin/groff -man
-dlang=ja_JP.eucJP
usr.bin/man/man.conf.5:117:TROFF_JA     /usr/local/bin/groff -man
-dlang=ja_JP.euc.jp

This is from an example in FreeBSD's man.conf(5) page.  The assignment to the
"lang" string has no particular effect in groff, and to the best of my
knowledge it never has.  This may be evidence of jgroff, a fork of groff that
was made unnecessary about in the Debian Project about 20 years ago
(https://www.debian.org/security/2002/dsa-107 ) and was superseded by groff
support in the 1.21 release about 13 years ago. 
https://lists.gnu.org/archive/html/groff/2010-12/msg00051.html

(Also, is that a typo adding a period in line 117.)  I suggest that the
foregoing might be bitrotted.

Last...

usr.bin/man/man.sh:1074:NROFF='groff -S -P-h -Wall -mtty-char -man'
usr.bin/man/man.sh:1078:TROFF='groff -S -man'

This is a case that should certainly be addressed, if this script still
fulfills the purpose claimed for it in its initial commit in 2010:
"Implementaiton [sic] of man, manpath, whatis, and apropos written entirely in
sh."

I strongly recommend s/-man/&doc/ here so that FreeBSD users will continue to
have a positive experience.

Also, for what it's worth, the `-S` option has been unnecessary; "safer" mode
has been the default in groff since version 1.12, released on 14 December 1999.

However, 32-33 false positives in a set of 35 suggests to me that your scanning
criteria could benefit from sensitivity tuning.  An error rate of over 90% is
generally considered unusable in serious measurement applications.

Regards,
Branden

-- 
You are receiving this mail because:
You are on the CC list for the bug.