At a customer installation of OpenProject, Subversion is still widely as a document management solution. While I advocate against its use for large repositories of documents, its use in that area exists historically due to the availability of clients and well-known processes for many users in the area of software development.

User errors such as deleted files and folders occur regularly and can be fixed client side. The real trouble beings when we receive incidents of files being added to the repository by mistake.

Unlike Git, Subversion (by design) does not provide client-side means to rewrite history. On the server-side, there is an ancient change request on the Subversion project tracker with some discussions in the past decade, but it appears to be a hard problem and no real solution has been achieved thus far.

Instead of rewriting a single revision, you will have to play back the whole repository with a specific exclusion of the bad changesets.

Preliminary

Let's assume some user has added unwanted files over the course of a few hours and you have a faulty revisions at revision number 150, while the current revision is at 175. You want to remove the onerevision from the face of the earth, while keep the remaining history intact.

On the machine serving the repositories, you'll have tools like svnadmin to access the server-side repository. For the remainder of this post, we're going to assume the repository with the two faulty revision resides at /var/svn/foobar.

1. Create empty repository

We're going to dump and load all revisions except the faulty ones into a new repository. Thus, you need to create a fresh repository using the following command:

1
svnadmin create /var/svn/new-foobar

2. Set the UUID of the old repository

To avoid issues for other users when accessing the fixed repository, import the UUID of the repository.

To find the UUID of the faulty repository:

1
2
3
4
5
6
7
8
9
10
svn info file:///var/svn/foobar
⟩ svn info
URL: file:///var/svn/foobar
Repository UUID: 50fdfe82-57f9-41f2-8b69-ef9662b13833
Revision: 179
Node Kind: directory
Schedule: normal
Last Changed Author: myuser
Last Changed Rev: 179
Last Changed Date: 2016-01-10 13:42:03 +0100

Then set the UUID with

1
svnadmin setuuid /var/svn/new-foobar 50fdfe82-57f9-41f2-8b69-ef9662b13833

3. Load valid revisions

You can stream all valid revision until (including) 149 from foobar to new-foobar:

1
svnadmin dump -r0:149 /var/svn/foobar | svnadmin load /var/svn/new-foobar

4. Create empty revision

You could now load the immediately following revisions, skipping revision 150. However, this will result in a smaller latest revision number (since input revisions are discarded) and may cause errors on a local working copy.

The stream output by svnadmin dump follows the Subversion dumpfiles format and can easily be used to forge an empty commit as a replacement for 150.

The following command loads a single revision, an empty commit with the message Empty log message (17 bytes, thus V 17) and a given timestamp.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
(cat <<END
Revision-number: 150
Prop-content-length: 92
Content-length: 92

K 7
svn:log
V 17
Empty log message
K 8
svn:date
V 27
2016-01-11T22:30:47.000000Z
PROPS-END
END
) | svnadmin load /var/svn/new-foobar

5. Replay remaining revisions

The remaining revisions should now be a piece of cake, assuming they did not further change the files you want to delete.

1
svnadmin dump -r151:175 --incremental /var/svn/foobar | svnadmin load /var/svn/new-foobar

This will load the remaining 24 revisions as an incremental change, rather than copying the whole working state of that revision.

Note: For newer versions of svnadmin, you need to pass --deltas to actually get incremental changes. Otherwise, svnadmin outputs entire files. See the upstream documentation for more information. (Thanks Marc for pointing that out)

After that, exchange the two repositories. Users should not notice the changes and should be able to correctly use the repository.

I've applied this process several times on repositories of tens of thousands of revisions, filtering more than one commit at a time. Note that dumping and loading the revisions takes a considerable amount of time (4.5 hours on an NFS-backed, 60GiB large SVN repository with roughly 25k revisions).

If you do have several commits where only a specifc file / path should be removed, take a look at svndumpfilter.