git: a61b8452d712 - main - devel/git-filter-repo: New port - a versatile tool for rewriting history

Cy Schubert cy at FreeBSD.org
Mon Jun 14 16:04:29 UTC 2021


The branch main has been updated by cy:

URL: https://cgit.FreeBSD.org/ports/commit/?id=a61b8452d712aae9d12b2b7de21accbe41d6ea9d

commit a61b8452d712aae9d12b2b7de21accbe41d6ea9d
Author:     Cy Schubert <cy at FreeBSD.org>
AuthorDate: 2021-06-14 15:55:56 +0000
Commit:     Cy Schubert <cy at FreeBSD.org>
CommitDate: 2021-06-14 16:04:03 +0000

    devel/git-filter-repo: New port - a versatile tool for rewriting history
    
    git filter-repo is a versatile tool for rewriting history, which
    includes capabilities I have not found anywhere else. It roughly falls
    into the same space of tool as git filter-branch but without the
    capitulation-inducing poor performance, with far more capabilities, and
    with a design that scales usability-wise beyond trivial rewriting cases.
    git filter-repo is now recommended by the git project instead of
    git filter-branch.
---
 devel/Makefile                                   |    1 +
 devel/git-filter-repo/Makefile                   |   50 +
 devel/git-filter-repo/distinfo                   |    3 +
 devel/git-filter-repo/files/git-filter-repo.1.in | 2335 ++++++++++++++++++++++
 devel/git-filter-repo/files/patch-Makefile       |   25 +
 devel/git-filter-repo/pkg-descr                  |    9 +
 devel/git-filter-repo/pkg-plist                  |    3 +
 7 files changed, 2426 insertions(+)

diff --git a/devel/Makefile b/devel/Makefile
index fed4749fcc13..a8d3dbf73ef7 100644
--- a/devel/Makefile
+++ b/devel/Makefile
@@ -810,6 +810,7 @@
     SUBDIR += git-cola
     SUBDIR += git-delta
     SUBDIR += git-extras
+    SUBDIR += git-filter-repo
     SUBDIR += git-lab
     SUBDIR += git-lfs
     SUBDIR += git-merge-changelog
diff --git a/devel/git-filter-repo/Makefile b/devel/git-filter-repo/Makefile
new file mode 100644
index 000000000000..18c9504efeec
--- /dev/null
+++ b/devel/git-filter-repo/Makefile
@@ -0,0 +1,50 @@
+PORTNAME=	git-filter-repo
+DISTVERSIONPREFIX=	v
+DISTVERSION=	2.32.0
+CATEGORIES=	devel
+
+MAINTAINER=	cy at FreeBSD.org
+COMMENT=	git filter-repo is a versatile tool for rewriting history
+
+LICENSE=	MIT
+LICENSE_FILE=	${WRKSRC}/COPYING.mit
+
+RUN_DEPENDS=	git:devel/git
+
+USES=		python shebangfix
+
+SHEBANG_FILES=	git-filter-repo
+
+USE_GITHUB=	yes
+GH_ACCOUNT=	newren
+
+NO_BUILD=	yes
+NO_ARCH=	yes
+
+#
+# XXX:	The man page only exists in the upstream docs branch. Unfortunately
+#	there is no clean way to extract just one file from a different
+#	upstream branch. Therefore we include it in files. To discover
+#	which version of the git-filter-repo.1 file in the docs branch
+#	corresponds with the extracted tag,
+#
+#	- git clone https://github.com/newren/git-filter-repo.git
+#	- git log, looking for the current tag's hash.
+#	- git switch docs
+#	- git log and look for the matching hash in the commit log.
+#	- git checkout HASH
+#	- copy the file to the files subdirectory of this port
+#
+#	Until a better approach can be discovered.
+#
+post-extract:
+	@${MKDIR} ${WRKSRC}/Documentation/man1;
+	${CP} ${FILESDIR}/git-filter-repo.1.in ${WRKSRC}/Documentation/man1/git-filter-repo.1
+
+do-install:
+	cd ${WRKSRC}; \
+	${MKDIR} ${STAGEDIR}/${PREFIX}/libexec/git-core; \
+	${MKDIR} ${STAGEDIR}/${PYTHON_SITELIBDIR}; \
+	${MAKE} prefix=${STAGEDIR}/${PREFIX} pythondir=${STAGEDIR}/${PYTHON_SITELIBDIR} install
+
+.include <bsd.port.mk>
diff --git a/devel/git-filter-repo/distinfo b/devel/git-filter-repo/distinfo
new file mode 100644
index 000000000000..7422229e8942
--- /dev/null
+++ b/devel/git-filter-repo/distinfo
@@ -0,0 +1,3 @@
+TIMESTAMP = 1623645535
+SHA256 (newren-git-filter-repo-v2.32.0_GH0.tar.gz) = 67afecbfacfc0c49cc17b627cb3bc20e08522b2fd2f93c6d2953ecea5afd6f83
+SIZE (newren-git-filter-repo-v2.32.0_GH0.tar.gz) = 158404
diff --git a/devel/git-filter-repo/files/git-filter-repo.1.in b/devel/git-filter-repo/files/git-filter-repo.1.in
new file mode 100644
index 000000000000..97331ba4e2b6
--- /dev/null
+++ b/devel/git-filter-repo/files/git-filter-repo.1.in
@@ -0,0 +1,2335 @@
+'\" t
+.\"     Title: git-filter-repo
+.\"    Author: [FIXME: author] [see http://www.docbook.org/tdg5/en/html/author]
+.\" Generator: DocBook XSL Stylesheets vsnapshot <http://docbook.sf.net/>
+.\"      Date: 06/07/2021
+.\"    Manual: Git Manual
+.\"    Source: Git 2.32.0.dirty
+.\"  Language: English
+.\"
+.TH "GIT\-FILTER\-REPO" "1" "06/07/2021" "Git 2\&.32\&.0\&.dirty" "Git Manual"
+.\" -----------------------------------------------------------------
+.\" * Define some portability stuff
+.\" -----------------------------------------------------------------
+.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.\" http://bugs.debian.org/507673
+.\" http://lists.gnu.org/archive/html/groff/2009-02/msg00013.html
+.\" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+.ie \n(.g .ds Aq \(aq
+.el       .ds Aq '
+.\" -----------------------------------------------------------------
+.\" * set default formatting
+.\" -----------------------------------------------------------------
+.\" disable hyphenation
+.nh
+.\" disable justification (adjust text to left margin only)
+.ad l
+.\" -----------------------------------------------------------------
+.\" * MAIN CONTENT STARTS HERE *
+.\" -----------------------------------------------------------------
+.SH "NAME"
+git-filter-repo \- Rewrite repository history
+.SH "SYNOPSIS"
+.sp
+.nf
+\fIgit filter\-repo\fR \-\-analyze
+\fIgit filter\-repo\fR [<path_filtering_options>] [<content_filtering_options>]
+        [<ref_renaming_options>] [<commit_message_filtering_options>]
+        [<name_or_email_filtering_options>] [<parent_rewriting_options>]
+        [<generic_callback_options>] [<miscellaneous_options>]
+.fi
+.sp
+.SH "DESCRIPTION"
+.sp
+Rapidly rewrite entire repository history using user\-specified filters\&. This is a destructive operation which should not be used lightly; it writes new commits, trees, tags, and blobs corresponding to (but filtered from) the original objects in the repository, then deletes the original history and leaves only the new\&. See the section called \(lqDISCUSSION\(rq for more details on the ramifications of using this tool\&. Several different types of history rewrites are possible; examples include (but are not limited to):
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+stripping large files (or large directories or large extensions)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+stripping unwanted files by path
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+extracting wanted paths and their history (stripping everything else)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+restructuring the file layout (such as moving all files into a subdirectory in preparation for merging with another repo, making a subdirectory become the new toplevel directory, or merging two directories with independent filenames into one directory)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+renaming tags (also often in preparation for merging with another repo)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+replacing or removing sensitive text such as passwords
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+making mailmap rewriting of user names or emails permanent
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+making grafts or replacement refs permanent
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+rewriting commit messages
+.RE
+.sp
+Additionally, several concerns are handled automatically (many of these can be overridden, but they are all on by default):
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+rewriting (possibly abbreviated) hashes in commit messages to refer to the new post\-rewrite commit hashes
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+pruning commits which become empty due to the above filters (also handles edge cases like pruning of merge commits which become degenerate and empty)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+creating replace\-refs (see
+\fBgit-replace\fR(1)) for old commit hashes, which if manually pushed and fetched will allow users to continue to refer to new commits using (unabbreviated) old commit IDs
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+stripping of original history to avoid mixing old and new history
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+repacking the repository post\-rewrite to shrink the repo for the user
+.RE
+.sp
+Also, it\(cqs worth noting that there is an important safety mechanism:
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+abort if run from a repo that is not a fresh clone (to prevent accidental data loss from rewriting local history that doesn\(cqt exist anywhere else)\&. See
+the section called \(lqFRESH CLONE SAFETY CHECK AND \-\-FORCE\(rq\&.
+.RE
+.sp
+For those who know that there is large unwanted stuff in their history and want help finding it, this command also
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+provides an option to analyze a repository and generate reports that can be useful in determining what to filter (or in determining whether a separate filtering command was successful)\&.
+.RE
+.sp
+See also the section called \(lqVERSATILITY\(rq, the section called \(lqDISCUSSION\(rq, the section called \(lqEXAMPLES\(rq, and the section called \(lqINTERNALS\(rq\&.
+.SH "OPTIONS"
+.SS "Analysis Options"
+.PP
+\-\-analyze
+.RS 4
+Analyze repository history and create a report that may be useful in determining what to filter in a subsequent run (or in determining if a previous filtering command did what you wanted)\&. Will not modify your repo\&.
+.RE
+.SS "Filtering based on paths (see also \-\-filename\-callback)"
+.PP
+\-\-invert\-paths
+.RS 4
+Invert the selection of files from the specified \-\-path\-{match,glob,regex} options below, i\&.e\&. only select files matching none of those options\&.
+.RE
+.PP
+\-\-path\-match <dir_or_file>, \-\-path <dir_or_file>
+.RS 4
+Exact paths (files or directories) to include in filtered history\&. Multiple \-\-path options can be specified to get a union of paths\&.
+.RE
+.PP
+\-\-path\-glob <glob>
+.RS 4
+Glob of paths to include in filtered history\&. Multiple \-\-path\-glob options can be specified to get a union of paths\&.
+.RE
+.PP
+\-\-path\-regex <regex>
+.RS 4
+Regex of paths to include in filtered history\&. Multiple \-\-path\-regex options can be specified to get a union of paths\&.
+.RE
+.PP
+\-\-use\-base\-name
+.RS 4
+Match on file base name instead of full path from the top of the repo\&. Incompatible with \-\-path\-rename, and incompatible with matching against directory names\&.
+.RE
+.SS "Renaming based on paths (see also \-\-filename\-callback)"
+.sp
+Note: if you combine path filtering with path renaming, be aware that a rename directive does not select paths, it only says how to rename paths that are selected with the filters\&.
+.PP
+\-\-path\-rename <old_name:new_name>, \-\-path\-rename\-match <old_name:new_name>
+.RS 4
+Path to rename; if filename or directory matches <old_name> rename to <new_name>\&. Multiple \-\-path\-rename options can be specified\&.
+.RE
+.SS "Path shortcuts"
+.PP
+\-\-paths\-from\-file <filename>
+.RS 4
+Specify several path filtering and renaming directives, one per line\&. Lines with
+\fB==>\fR
+in them specify path renames, and lines can begin with
+\fBliteral:\fR
+(the default),
+\fBglob:\fR, or
+\fBregex:\fR
+to specify different matching styles\&. Blank lines and lines starting with a
+\fB#\fR
+are ignored (if you have a filename that you want to filter on that starts with
+\fBliteral:\fR,
+\fB#\fR,
+\fBglob:\fR, or
+\fBregex:\fR, then prefix the line with
+\fIliteral:\fR)\&.
+.RE
+.PP
+\-\-subdirectory\-filter <directory>
+.RS 4
+Only look at history that touches the given subdirectory and treat that directory as the project root\&. Equivalent to using
+\fB\-\-path <directory>/ \-\-path\-rename <directory>/:\fR
+.RE
+.PP
+\-\-to\-subdirectory\-filter <directory>
+.RS 4
+Treat the project root as instead being under <directory>\&. Equivalent to using
+\fB\-\-path\-rename :<directory>/\fR
+.RE
+.SS "Content editing filters (see also \-\-blob\-callback)"
+.PP
+\-\-replace\-text <expressions_file>
+.RS 4
+A file with expressions that, if found, will be replaced\&. By default, each expression is treated as literal text, but
+\fBregex:\fR
+and
+\fBglob:\fR
+prefixes are supported\&. You can end the line with
+\fB==>\fR
+and some replacement text to choose a replacement choice other than the default of
+\fB***REMOVED***\fR\&.
+.RE
+.PP
+\-\-strip\-blobs\-bigger\-than <size>
+.RS 4
+Strip blobs (files) bigger than specified size (e\&.g\&.
+\fB5M\fR,
+\fB2G\fR, etc)
+.RE
+.PP
+\-\-strip\-blobs\-with\-ids <blob_id_filename>
+.RS 4
+Read git object ids from each line of the given file, and strip all of them from history
+.RE
+.SS "Renaming of refs (see also \-\-refname\-callback)"
+.PP
+\-\-tag\-rename <old:new>
+.RS 4
+Rename tags starting with <old> to start with <new>\&. For example, \-\-tag\-rename foo:bar will rename tag foo\-1\&.2\&.3 to bar\-1\&.2\&.3; either <old> or <new> can be empty\&.
+.RE
+.SS "Filtering of commit messages (see also \-\-message\-callback)"
+.PP
+\-\-preserve\-commit\-hashes
+.RS 4
+By default, since commits are rewritten and thus gain new hashes, references to old commit hashes in commit messages are replaced with new commit hashes (abbreviated to the same length as the old reference)\&. Use this flag to turn off updating commit hashes in commit messages\&.
+.RE
+.PP
+\-\-preserve\-commit\-encoding
+.RS 4
+Do not reencode commit messages into UTF\-8\&. By default, if the commit object specifies an encoding for the commit message, the message is re\-encoded into UTF\-8\&.
+.RE
+.SS "Filtering of names & emails (see also \-\-name\-callback and \-\-email\-callback)"
+.PP
+\-\-mailmap <filename>
+.RS 4
+Use specified mailmap file (see
+\fBgit-shortlog\fR(1)
+for details on the format) when rewriting author, committer, and tagger names and emails\&. If the specified file is part of git history, historical versions of the file will be ignored; only the current contents are consulted\&.
+.RE
+.PP
+\-\-use\-mailmap
+.RS 4
+Same as:
+\fI\-\-mailmap \&.mailmap\fR
+.RE
+.SS "Parent rewriting"
+.PP
+\-\-replace\-refs {delete\-no\-add, delete\-and\-add, update\-no\-add, update\-or\-add, update\-and\-add}
+.RS 4
+Replace refs (see
+\fBgit-replace\fR(1)) are used to rewrite parents (unless turned off by the usual git mechanism); this flag specifies what do do with those refs afterward\&. Replace refs can either be deleted or updated to point at new commit hashes\&. Also, new replace refs can be added for each commit rewrite\&. With
+\fIupdate\-or\-add\fR, new replace refs are only added for commit rewrites that aren\(cqt used to update an existing replace ref\&. default is
+\fIupdate\-and\-add\fR
+if $GIT_DIR/filter\-repo/already_ran does not exist;
+\fIupdate\-or\-add\fR
+otherwise\&.
+.RE
+.PP
+\-\-prune\-empty {always, auto, never}
+.RS 4
+Whether to prune empty commits\&.
+\fIauto\fR
+(the default) means only prune commits which become empty (not commits which were empty in the original repo, unless their parent was pruned)\&. When the parent of a commit is pruned, the first non\-pruned ancestor becomes the new parent\&.
+.RE
+.PP
+\-\-prune\-degenerate {always, auto, never}
+.RS 4
+Since merge commits are needed for history topology, they are typically exempt from pruning\&. However, they can become degenerate with the pruning of other commits (having fewer than two parents, having one commit serve as both parents, or having one parent as the ancestor of the other\&.) If such merge commits have no file changes, they can be pruned\&. The default (\fIauto\fR) is to only prune empty merge commits which become degenerate (not which started as such)\&.
+.RE
+.PP
+\-\-no\-ff
+.RS 4
+Even if the first parent is or becomes an ancestor of another parent, do not prune it\&. This modifies how \-\-prune\-degenerate behaves, and may be useful in projects who always use merge \-\-no\-ff\&.
+.RE
+.SS "Generic callback code snippets"
+.PP
+\-\-filename\-callback <function_body>
+.RS 4
+Python code body for processing filenames; see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.PP
+\-\-message\-callback <function_body>
+.RS 4
+Python code body for processing messages (both commit messages and tag messages); see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.PP
+\-\-name\-callback <function_body>
+.RS 4
+Python code body for processing names of people; see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.PP
+\-\-email\-callback <function_body>
+.RS 4
+Python code body for processing emails addresses; see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.PP
+\-\-refname\-callback <function_body>
+.RS 4
+Python code body for processing refnames; see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.PP
+\-\-blob\-callback <function_body>
+.RS 4
+Python code body for processing blob objects; see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.PP
+\-\-commit\-callback <function_body>
+.RS 4
+Python code body for processing commit objects; see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.PP
+\-\-tag\-callback <function_body>
+.RS 4
+Python code body for processing tag objects; see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.PP
+\-\-reset\-callback <function_body>
+.RS 4
+Python code body for processing reset objects; see
+the section called \(lqCALLBACKS\(rq\&.
+.RE
+.SS "Location to filter from/to"
+.if n \{\
+.sp
+.\}
+.RS 4
+.it 1 an-trap
+.nr an-no-space-flag 1
+.nr an-break-flag 1
+.br
+.ps +1
+\fBNote\fR
+.ps -1
+.br
+.sp
+Specifying alternate source or target locations implies \-\-partial except that the normal default for \-\-replace\-refs is used\&. However, unlike normal uses of \-\-partial, this doesn\(cqt risk mixing old and new history since the old and new histories are in different repositories\&.
+.sp .5v
+.RE
+.PP
+\-\-source <source>
+.RS 4
+Git repository to read from
+.RE
+.PP
+\-\-target <target>
+.RS 4
+Git repository to overwrite with filtered history
+.RE
+.SS "Miscellaneous options"
+.PP
+\-\-help, \-h
+.RS 4
+Show a help message and exit\&.
+.RE
+.PP
+\-\-force, \-f
+.RS 4
+Ignore fresh clone checks and rewrite history (an irreversible operation, especially since it by default ends with an immediate pruning of reflogs and old objects)\&. See
+the section called \(lqFRESH CLONE SAFETY CHECK AND \-\-FORCE\(rq\&. Note that when cloning repos on a local filesystem, it is better to pass
+\fB\-\-no\-local\fR
+to git clone than passing
+\fB\-\-force\fR
+to git\-filter\-repo\&.
+.RE
+.PP
+\-\-partial
+.RS 4
+Do a partial history rewrite, resulting in the mixture of old and new history\&. This implies a default of update\-no\-add for \-\-replace\-refs, disables rewriting refs/remotes/origin/* to refs/heads/*, disables removing of the
+\fIorigin\fR
+remote, disables removing unexported refs, disables expiring the reflog, and disables the automatic post\-filter gc\&. Also, this modifies \-\-tag\-rename and \-\-refname\-callback options such that instead of replacing old refs with new refnames, it will instead create new refs and keep the old ones around\&. Use with caution\&.
+.RE
+.PP
+\-\-refs <refs+>
+.RS 4
+Limit history rewriting to the specified refs\&. Implies \-\-partial\&. In addition to the normal caveats of \-\-partial (mixing old and new history, no automatic remapping of refs/remotes/origin/* to refs/heads/*, etc\&.), this also may cause problems for pruning of degenerate empty merge commits when negative revisions are specified\&.
+.RE
+.PP
+\-\-dry\-run
+.RS 4
+Do not change the repository\&. Run
+\fBgit fast\-export\fR
+and filter its output, and save both the original and the filtered version for comparison\&. This also disables rewriting commit messages due to not knowing new commit IDs and disables filtering of some empty commits due to inability to query the fast\-import backend\&.
+.RE
+.PP
+\-\-debug
+.RS 4
+Print additional information about operations being performed and commands being run\&. (If used together with \-\-dry\-run, shows extra information about what would be run)\&.
+.RE
+.PP
+\-\-stdin
+.RS 4
+Instead of running
+\fBgit fast\-export\fR
+and filtering its output, filter the fast\-export stream from stdin\&. The stdin must be in the expected input format (e\&.g\&. it needs to include original\-oid directives)\&.
+.RE
+.PP
+\-\-quiet
+.RS 4
+Pass \-\-quiet to other git commands called\&.
+.RE
+.SH "OUTPUT"
+.sp
+Every time filter\-repo is run, files are created in the \fB\&.git/filter\-repo/\fR directory\&. These files overwritten unconditionally on every run\&.
+.SS "Commit map"
+.sp
+The \fB\&.git/filter\-repo/commit\-map\fR file contains a mapping of how all commits were (or were not) changed\&.
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+A header is the first line with the text "old" and "new"
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Commit mappings are in no particular order
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+All commits in range of the rewrite will be listed, even commits that are unchanged (e\&.g\&. because the commit pre\-dated when the large file(s) were introduced to the repo)\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+An all\-zeros hash, or null SHA, represents a non\-existant object\&. When in the "new" column, this means the commit was removed entirely\&.
+.RE
+.SS "Reference map"
+.sp
+The \fB\&.git/filter\-repo/ref\-map\fR file contains a mapping of which local references were changed\&.
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+A header is the first line with the text "old" and "new"
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Reference mappings are in no particular order
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+An all\-zeros hash, or null SHA, represents a non\-existant object\&. When in the "new" column, this means the ref was removed entirely\&.
+.RE
+.SH "FRESH CLONE SAFETY CHECK AND \-\-FORCE"
+.sp
+Since filter\-repo does irreversible rewriting of history, it is important to avoid making changes to a repo for which the user doesn\(cqt have a good backup\&. The primary defense mechanism is to simply educate users and rely on them to be good stewards of their data; thus there are several warnings in the documentation about how filter repo rewrites history\&.
+.sp
+However, as a service to users, we would like to provide an additional safety check beyond the documentation\&. There isn\(cqt a good way to check if the user has a good backup, but we can ask a related question that is an imperfect but quite reasonable proxy: "Is this repository a fresh clone?" Unfortunately, that is also a question we can\(cqt get a perfect answer to; git provides no way to answer that question\&. However, there are approximately a dozen things that I found that seem to always be true of brand new clones (assuming they are either clones of remote repositories or are made with the \fB\-\-no\-local\fR flag), and I check for all of those\&.
+.sp
+These checks can have both false positives and false negatives\&. Someone might have a perfectly good backup of their repo without it actually being a fresh clone \(em but there\(cqs no way for filter\-repo to know that\&. Conversely, someone could look at all things that filter\-repo checks for in its safety checks and then just tweak their non\-backed\-up repository to satisfy those conditions (though it would take a fair amount of effort, and it\(cqs astronomically unlikely that a repo that isn\(cqt a fresh clone randomly happens to match all the criteria)\&. In practice, the safety checks filter\-repo uses seem to be really good at avoiding people accidentally running filter\-repo on a repository that they shouldn\(cqt be running it on\&. It even caught me once when I did mean to run filter\-repo but was in a different directory than I thought I was\&.
+.sp
+In short, it\(cqs perfectly fine to use \(oq\-\-force` to override the safety checks as long as you\(cqre okay with filter\-repo irreversibly rewriting the contents of the current repository\&. It is a really bad idea to get in the habit of always specifying \fB\-\-force\fR; if you do, one day you will run one of your commands in the wrong directory like I did, and you won\(cqt have the safety check anymore to bail you out\&. Also, it is definitely NOT okay to recommend \fB\-\-force\fR on forums, Q&A sites, or in emails to other users without first carefully explaining that \fB\-\-force\fR means putting your repositories\(cq data at risk\&. I am especially bothered by people who suggest the flag when it clearly is NOT needed; they are needlessly putting other peoples\*(Aq data at risk\&.
+.SH "VERSATILITY"
+.sp
+filter\-repo has a hierarchy of capabilities on the spectrum from easy to use convenience flags that perform pre\-defined types of filtering, to choices that provide lots of flexibility in controlling how filtering occurs\&. This spectrum includes the following:
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Convenience flags making common types of history rewriting simple (e\&.g\&. \-\-path, \-\-strip\-blobs\-bigger\-than, \-\-replace\-text, \-\-mailmap)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Options which are shorthand for others or which provide greater control than others (e\&.g\&. \-\-subdirectory\-filter could just be written using both a path selection (\-\-path) and a path rename (\-\-path\-rename) filter; \-\-paths\-from\-file can handle all other \-\-path* options and more such as regex renaming of paths)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Generic python callbacks for handling a certain type of data (the filename, message, name, email, and refname callbacks)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Generic python callbacks for handling fundamental git objects, allowing greater control over the combination of data types the object holds (the commit, tag, blob, and reset callbacks)
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+The ability to import filter\-repo as a module in a python program and use its classes and functions for even greater control and flexibility while still leveraging lots of basic capabilities\&. One can even use this to write new tools with a completely different interface\&.
+.RE
+.sp
+For more information about callbacks, see the section called \(lqCALLBACKS\(rq\&. For examples on writing python programs that import filter\-repo as a module to create new history rewriting tools, look at the contrib/filter\-repo\-demos/ directory\&. That directory includes, among other examples, a reimplementation of git\-filter\-branch which is faster than git\-filter\-branch, and a reimplementation of BFG Repo Cleaner with several bug fixes and new features\&.
+.SH "DISCUSSION"
+.sp
+Using filter\-repo is relatively simple, but rewriting history is part of a larger discussion in terms of collaboration\&. When you rewrite history, the old and new histories are no longer compatible; if you push this history somewhere for others to view, it will look as though you\(cqve done a rebase of all branches and tags\&. Make sure you are familiar with the "RECOVERING FROM UPSTREAM REBASE" section of \fBgit-rebase\fR(1) (and in particular, "The hard case") before proceeding, in addition to this section\&.
+.sp
+Steps to use git\-filter\-repo as part of the bigger picture of doing a history rewrite are roughly as follows:
+.sp
+.RS 4
+.ie n \{\
+\h'-04' 1.\h'+01'\c
+.\}
+.el \{\
+.sp -1
+.IP "  1." 4.2
+.\}
+Create a clone of your repository (if you created special refs outside of refs/heads/ or refs/tags/, make sure to fetch those too)\&. You may pass
+\fB\-\-bare\fR
+or
+\fB\-\-mirror\fR
+to
+\fBgit clone\fR, if you prefer\&. You should pass
+\fB\-\-no\-local\fR
+if the repository you are cloning from is on the local filesystem\&. Avoid other flags; some might confuse the fresh clone check, and others could cause parts of the data to be missing that are needed for the rewrite\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04' 2.\h'+01'\c
+.\}
+.el \{\
+.sp -1
+.IP "  2." 4.2
+.\}
+(Optional) Run
+\fBgit filter\-repo \-\-analyze\fR\&. This will create a directory of reports mentioning renames that have occurred in your repo and also listing sizes of objects aggregated by path/directory/extension/blob\-id; this information may be useful in choosing how to filter your repo\&. It can also be useful to re\-run \-\-analyze after filtering to verify the changes look correct\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04' 3.\h'+01'\c
+.\}
+.el \{\
+.sp -1
+.IP "  3." 4.2
+.\}
+Run filter\-repo with your desired filtering options\&. Many examples are given below\&. For more complex cases, note that doing the filtering in multiple steps (by running multiple filter\-repo invocations in a sequence) is supported\&. If anything goes wrong here, simply delete your clone and restart\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04' 4.\h'+01'\c
+.\}
+.el \{\
+.sp -1
+.IP "  4." 4.2
+.\}
+Push your new repository to its new home (note that refs/remotes/origin/* will have been moved to refs/heads/* as the first part of filter\-repo, so you can just deal with normal branches instead of remote tracking branches)\&. While you can force push this to the same URL you cloned from, there are good reasons to consider pushing to a different location instead:
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+People who cloned from the original repo will have old history\&. When they fetch the new history you force pushed up, unless they do a
+\fBgit reset \-\-hard @{u}\fR
+on their branches or rebase their local work, git will think they have hundreds or thousands of commits with very similar commit messages as what exist upstream (but which include files you wanted excised from history), and allow the user to merge the two histories, resulting in what looks like two copies of each commit\&. If they then push this history back up, then everyone now has history with two copies of each commit and the bad files have returned\&. You\(cqre more likely to succeed in forcing people to get rid of the old history if they have to clone a new URL\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Rewriting history will rewrite tags; those who have already downloaded tags will not get the updated tags by default (see the "On Re\-tagging" section of
+\fBgit-tag\fR(1))\&. Every user trying to use an existing clone will have to forcibly delete all tags and re\-fetch them; it may be easier for them to just re\-clone, which they are more likely to do with a new clone URL\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Rewriting history may delete some refs (e\&.g\&. branches that only had files that you wanted excised from history); unless you run git push with the
+\fB\-\-mirror\fR
+or
+\fB\-\-prune\fR
+options, those refs will continue to exist on the server\&. If folks then merge these branches into others, then people have started mixing old and new history\&. If users had already cloned these branches, removing them from the server isn\(cqt enough; you need all users to delete any local branches based on these refs and run fetch with the
+\fB\-\-prune\fR
+option as well\&. Simply re\-cloning from a new URL is easier\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+The server may not allow you to force push over some refs\&. For example, code review systems may have special ref namespaces (e\&.g\&. refs/changes/, refs/pull/, refs/merge\-requests/) that they have locked down\&.
+.RE
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04' 5.\h'+01'\c
+.\}
+.el \{\
+.sp -1
+.IP "  5." 4.2
+.\}
+If you still want to push your rewritten history back to the original url despite my warnings above, you\(cqll have to manage it very carefully:
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+git\-filter\-repo deletes the "origin" remote to help avoid people accidentally repushing to the same repository, so you\(cqll need to remind git what origin\(cqs url was\&. You\(cqll have to look up the command for that\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+You\(cqll need to carefully synchronize with
+\fBeveryone\fR
+who has cloned the repository, and will also need to carefully synchronize with
+\fBeverything\fR
+(e\&.g\&. CI systems) that has cloned it\&. Every single clone will either need to be thrown away and re\-cloned, or need to take all the steps outlined in item 4 as well as follow the necessary steps from "RECOVERING FROM UPSTREAM REBASE" section of
+\fBgit-rebase\fR(1)\&. If you miss fixing any clones, you\(cqll risk mixing old and new history and end up with an even worse mess to clean up\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+Finally, you\(cqll need to consult any documentation from your hosting provider about how to remove any server\-side references to the old commits (example:
+\m[blue]\fBGitLab\(cqs excellent docs on reducing repository size\fR\m[]\&\s-2\u[1]\d\s+2, or just the warning box that references "GitHub support" from
+\m[blue]\fBGitHub\(cqs otherwise dangerously out\-of\-date docs on removing sensitive data\fR\m[]\&\s-2\u[2]\d\s+2)\&.
+.RE
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04' 6.\h'+01'\c
+.\}
+.el \{\
+.sp -1
+.IP "  6." 4.2
+.\}
+(Optional) Some additional considerations
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+filter\-repo by default creates replace refs (see
+\fBgit-replace\fR(1)) for each rewritten commit ID, allowing you to use old (unabbreviated) commit hashes in the git command line to refer to the newly rewritten commits\&. If you want to use these replace refs, manually push them to the relevant clone URL and tell users to manually fetch them (e\&.g\&. by adjusting their fetch refspec,
+\fBgit config \-\-add remote\&.origin\&.fetch +refs/replace/*:refs/replace/*\fR)\&. Sadly, replace refs are not yet widely understood; projects like jgit and libgit2 do not support them and existing repository managers (e\&.g\&. Gerrit, GitHub, GitLab) do not yet understand replace refs\&. Thus one can\(cqt use old commit hashes within the UI of these other systems\&. This may change in the future, but replace refs at least help users locally within the git command line interface\&. Also, be aware that commit\-graphs are excessively cautious around replace refs and just turn off entirely if any are present, so after enough time has passed that old commit IDs become less relevant, users may want to locally delete the replace refs to regain the speedups from commit\-graphs\&.
+.RE
+.sp
+.RS 4
+.ie n \{\
+\h'-04'\(bu\h'+03'\c
+.\}
+.el \{\
+.sp -1
+.IP \(bu 2.3
+.\}
+If you have a central repo, you may want to prevent people from pushing old commit IDs, in order to avoid mixing old and new history\&. Every repository manager does this differently, some provide specialized commands (e\&.g\&.
+\m[blue]\fBhttps://gerrit\-review\&.googlesource\&.com/Documentation/cmd\-ban\-commit\&.html\fR\m[]), others require you to write hooks\&.
+.RE
+.RE
+.SH "EXAMPLES"
*** 1515 LINES SKIPPED ***


More information about the dev-commits-ports-main mailing list