From nobody Tue Oct 25 15:06:28 2022 X-Original-To: dev-commits-src-all@mlmmj.nyi.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2610:1c1:1:606c::19:1]) by mlmmj.nyi.freebsd.org (Postfix) with ESMTP id 4MxZyr4SsBz4g3Dt; Tue, 25 Oct 2022 15:06:28 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from mxrelay.nyi.freebsd.org (mxrelay.nyi.freebsd.org [IPv6:2610:1c1:1:606c::19:3]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256 client-signature RSA-PSS (4096 bits) client-digest SHA256) (Client CN "mxrelay.nyi.freebsd.org", Issuer "R3" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 4MxZyr2mc6z3vXm; Tue, 25 Oct 2022 15:06:28 +0000 (UTC) (envelope-from git@FreeBSD.org) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1666710388; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ZuubRy+Mtw0oTmbVsHjHxG+EDMogkJq9bEBtiD/tlPQ=; b=lVp9n9Oq0y/F/sT2Xn7mIhWR0bT8cGabW5DYseZhiVknfU7652sRllO0bjZu5SdzkVu+8Z vu0vQ+NuysvHaHZYwviIXJYGcL3zaP0WS1IUY6OAZknbsX8XWny9WapPz5Qu5q5MtaugbE O0Gd88r6Aa4H84X4TQwC5CCUA3MhW2Pu6s5M3CWoHYF4oagrsU6bluXYU+LmxqX8AISKMa +GJ69EojzgdSYtC3OS76Z96yX+rTRgydq6VLm07h+pRepObVrF/YHLrRT0QKuOeIPhrzCe BKpidjV+sIAZaZk9jf5GH553bPaRHRBVdjbYWSdkdg80N969rwie3aHCVw4W/A== Received: from gitrepo.freebsd.org (gitrepo.freebsd.org [IPv6:2610:1c1:1:6068::e6a:5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by mxrelay.nyi.freebsd.org (Postfix) with ESMTPS id 4MxZyr1d8czhp1; Tue, 25 Oct 2022 15:06:28 +0000 (UTC) (envelope-from git@FreeBSD.org) Received: from gitrepo.freebsd.org ([127.0.1.44]) by gitrepo.freebsd.org (8.16.1/8.16.1) with ESMTP id 29PF6S1S008791; Tue, 25 Oct 2022 15:06:28 GMT (envelope-from git@gitrepo.freebsd.org) Received: (from git@localhost) by gitrepo.freebsd.org (8.16.1/8.16.1/Submit) id 29PF6SDB008790; Tue, 25 Oct 2022 15:06:28 GMT (envelope-from git) Date: Tue, 25 Oct 2022 15:06:28 GMT Message-Id: <202210251506.29PF6SDB008790@gitrepo.freebsd.org> To: src-committers@FreeBSD.org, dev-commits-src-all@FreeBSD.org, dev-commits-src-main@FreeBSD.org From: Kyle Evans Subject: git: 5c053aa3c5e9 - main - split: switch to getline() for line/pattern matching List-Id: Commit messages for all branches of the src repository List-Archive: https://lists.freebsd.org/archives/dev-commits-src-all List-Help: List-Post: List-Subscribe: List-Unsubscribe: Sender: owner-dev-commits-src-all@freebsd.org X-BeenThere: dev-commits-src-all@freebsd.org MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-Git-Committer: kevans X-Git-Repository: src X-Git-Refname: refs/heads/main X-Git-Reftype: branch X-Git-Commit: 5c053aa3c5e907bdd1ac466ce9b58611781c2c20 Auto-Submitted: auto-generated ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=freebsd.org; s=dkim; t=1666710388; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=ZuubRy+Mtw0oTmbVsHjHxG+EDMogkJq9bEBtiD/tlPQ=; b=JGXgdq2ZFOHVgP0ekOlzcqmxQ++SGef96T96ui2tSeV9fx0qjaqfuJv0IAy1J1vE7+6Ewl FfHCGToYNl0APMjHOu2GFyJ1OpNqiIDcO0ytulRgx9UFKvzAP64mtRohd76fjZU/r7HSyT e9/o5WoffJ7NPzyRHtyvZV0Dcc+MJnU+jP4DS6kzbN1EjLT/X+VzclapaOxHg/jOAIWexZ ArMPjsE1pO+9T8PmDsb3nOH9YapygYf6FpWqMo72GIiQ0BO486r9ZIU8UbRqobXqXJESNX XrDubs0vpyNJmsL562V6mG3pNOoT86FXMnkkVtGd+/EPab1EUmgbzc7LDDkCXg== ARC-Seal: i=1; s=dkim; d=freebsd.org; t=1666710388; a=rsa-sha256; cv=none; b=dv84k0TBp4b1IXsBCwycFm7EdCqs0QqlBFdMTCIzNlE6H62fEcLUrrv4RJDr5SdO1pnD03 ngpo5Ch6i8MbO2l1gvoDibvsaJVLtHMTkg223ZEXheoUKKQcc40ryDUT9wriI0hEjLXw9a iqgOJq7QPNGvv2291waARrtaOV1gOurc2jKplCzmHGY7ameKchuDi0oN36N18eJdfqq+ks wX2LRhS6EkcAdsqJB+Bg95iJAunOqeL7xmqpacvkoFXxGBXpp1ePNjPrACFTGUqfcl1fTZ xHMv4vPdRxzbOkMakEilgtWSPjP45OobOthI8wJcgaKeEmsX8JGc1AbJeXZPIQ== ARC-Authentication-Results: i=1; mx1.freebsd.org; none X-ThisMailContainsUnwantedMimeParts: N The branch main has been updated by kevans: URL: https://cgit.FreeBSD.org/src/commit/?id=5c053aa3c5e907bdd1ac466ce9b58611781c2c20 commit 5c053aa3c5e907bdd1ac466ce9b58611781c2c20 Author: Kyle Evans AuthorDate: 2022-08-23 02:05:58 +0000 Commit: Kyle Evans CommitDate: 2022-10-25 15:05:23 +0000 split: switch to getline() for line/pattern matching Get rid of split's home-grown logic for growing the buffer; arbitrarily breaking at LONG_MAX bytes instead of 65536 bytes gives us much more wiggle room. Additionally, we'll actually fail out entirely if we can't fit a line, which makes noticing this class of problem much easier. Reviewed by: bapt, emaste, pauamma Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D36323 --- usr.bin/split/split.1 | 8 +++++--- usr.bin/split/split.c | 25 ++++++++++++------------- 2 files changed, 17 insertions(+), 16 deletions(-) diff --git a/usr.bin/split/split.1 b/usr.bin/split/split.1 index 8f287a4163dd..684cad57d4fc 100644 --- a/usr.bin/split/split.1 +++ b/usr.bin/split/split.1 @@ -28,7 +28,7 @@ .\" @(#)split.1 8.3 (Berkeley) 4/16/94 .\" $FreeBSD$ .\" -.Dd May 9, 2013 +.Dd October 25, 2022 .Dt SPLIT 1 .Os .Sh NAME @@ -213,5 +213,7 @@ A .Nm command appeared in .At v3 . -.Sh BUGS -The maximum line length for matching patterns is 65536. +.Pp +Before +.Fx 14 , +pattern matching and only operated on lines shorter than 65,536 bytes. diff --git a/usr.bin/split/split.c b/usr.bin/split/split.c index 9028b29d1c69..008b614f4946 100644 --- a/usr.bin/split/split.c +++ b/usr.bin/split/split.c @@ -70,7 +70,6 @@ static off_t chunks = 0; /* Chunks count to split into. */ static long numlines; /* Line count to split on. */ static int file_open; /* If a file open. */ static int ifd = -1, ofd = -1; /* Input/output file descriptors. */ -static char bfr[MAXBSIZE]; /* I/O buffer. */ static char fname[MAXPATHLEN]; /* File name prefix. */ static regex_t rgx; static int pflag; @@ -203,6 +202,7 @@ main(int argc, char **argv) static void split1(void) { + static char bfr[MAXBSIZE]; off_t bcnt; char *C; ssize_t dist, len; @@ -211,7 +211,7 @@ split1(void) nfiles = 0; for (bcnt = 0;;) - switch ((len = read(ifd, bfr, MAXBSIZE))) { + switch ((len = read(ifd, bfr, sizeof(bfr)))) { case 0: exit(0); case -1: @@ -264,46 +264,45 @@ split1(void) static void split2(void) { + char *buf; + size_t bufsize; + ssize_t len; long lcnt = 0; FILE *infp; + buf = NULL; + bufsize = 0; + /* Stick a stream on top of input file descriptor */ if ((infp = fdopen(ifd, "r")) == NULL) err(EX_NOINPUT, "fdopen"); /* Process input one line at a time */ - while (fgets(bfr, sizeof(bfr), infp) != NULL) { - const int len = strlen(bfr); - - /* If line is too long to deal with, just write it out */ - if (bfr[len - 1] != '\n') - goto writeit; - + while ((len = getline(&buf, &bufsize, infp)) > 0) { /* Check if we need to start a new file */ if (pflag) { regmatch_t pmatch; pmatch.rm_so = 0; pmatch.rm_eo = len - 1; - if (regexec(&rgx, bfr, 0, &pmatch, REG_STARTEND) == 0) + if (regexec(&rgx, buf, 0, &pmatch, REG_STARTEND) == 0) newfile(); } else if (lcnt++ == numlines) { newfile(); lcnt = 1; } -writeit: /* Open output file if needed */ if (!file_open) newfile(); /* Write out line */ - if (write(ofd, bfr, len) != len) + if (write(ofd, buf, len) != len) err(EX_IOERR, "write"); } /* EOF or error? */ - if (ferror(infp)) + if ((len == -1 && errno != 0) || ferror(infp)) err(EX_IOERR, "read"); else exit(0);