How the ConvertExtension works

This page documents the implementation of the ConvertExtension as of f29b674cc221 (in main or crew).

If you are a user, you should read the ConvertExtension page instead. If you still intend to read this page, you should read ConvertExtension first, so that you have a full understanding of what the implementation is implementing.

Currently, this document is a very broad draft. It will have more details in time.

Inputs

Every VCS that the ConvertExtension supports as input is represented by a subclass of the abstract class common.converter_source. The repository to convert is represented by an instance of a source. For example, when converting from Subversion, the input repository is represented by an instance of convert.subversion.svn_source, which is a subclass of converter_source.

The converter_source class takes three arguments:

Additionally, it has a property named encoding, which is 'utf-8' by default.

If a converter_source subclass finds no valid repository at the given path, it raises the exception common.NoRepo. The subclass may also raise NoRepo if some library that it depends on isn't available.

The Subversion source (convert.subversion.svn_source) uses “url”, not “path”, as the name of its second argument, because Subversion works with URLs rather than paths. However, it does support local file paths; it uses the geturl function in the same module to silently upgrade a path to a URL. Additionally, the url argument is not optional.

`before` and `after`

Subclasses can implement these to perform any necessary preparation and clean-up.

`getheads`

Returns the revision identifiers that exist in the source repository.

For most Subversion repositories, the heads are the revision numbers of the latest revision on the trunk and the latest revision on every branch.

Outputs

Output classes are called sinks. Every VCS that the ConvertExtension supports as output is represented by a subclass of the abstract class common.converter_sink.

The converter_sink class takes two arguments:

Unlike the converter_source class, the path argument here is not optional.

Additionally, it has a property named created_files, which is a list that holds the paths to files that the sink has created, so that the sink can unlink them if the conversion fails. The abstract class does not handle this clean-up; it leaves it to the subclasses.

Abstract methods

`before` and `after`

Subclasses can implement these to perform any necessary preparation and clean-up.

`getheads`

Returns the revision identifiers that already exist in the destination repository.

`authorfile`

A subclass can implement this to return the path to a file within the repository where an authormap file may be found.

`revmapfile`

A subclass can implement this to return the path to a file within the repository where a revision-map file may be found.

The convert command

The command is implemented in the convert.convcmd sub-module. Only the most basic requirements for a Mercurial extension command are in convert.__init__; the convert function there tail-calls convert.convcmd.convert.

The convert function calls two subroutines in the same module, convertsink and convertsource, to obtain sink and source instances for the destination and source repositories. These functions that iterate the mappings of VCS names to sink/source classes, trying each class in turn on the specified destination and source repositories.

The final step in the function is to create an instance of convcmd.converter, which is the class that actually performs the conversion.

Anatomy of `convcmd.converter`

The class takes five arguments, all required:

Additionally, it has four properties:

A revision map associates a commit in the source repository with a commit in the destination repository. This is the converter's record of which commits it has already copied. It is backed by a file, and updates that file whenever another object stores another commit in it. If the user runs the converter again, it reads the revision map back in, and uses it to resume the conversion rather than start it over from the beginning.

The main method of the class is convert, which the top-level convcmd.convert function calls to do the work.

Commit ordering

Order is significant, as revision identifiers in Mercurial are dependent on the order of the commits. (Mercurial defines a revision identifier as the hash of number of pieces of data from the commit, one of which is the revision number of the commit's parent.)

By default, the ConvertExtension copies commits in [http://en.wikipedia.org/wiki/Topological_sort topological order, aka ancestral order]. As you might guess from the latter name, this means only that a commit is guaranteed to come before a commit that depends on it.

With the --datesort option, the ConvertExtension instead copies commits in the order in which they were originally committed in the source repository. As long as humans are not capable of time travel and the repository itself has not been tampered with, this chronological sort is also a valid topological sort.

Both sorts are performed by the converter.toposort method.

The conversion process

Conversion truly begins in the converter.convert method, although most of the real work is still done in other methods (not to mention other classes).

First, the converter must determine the commits to copy. It starts by getting the list of heads from the source repository (using converter_source.getheads); then, it uses converter.walktree to find all the ancestors of those heads.

walktree returns an object that maps each commit to a list of its parents. All the commits in this mapping are those that have not yet been copied to the destination repository; when it encounters a commit that is in the converter's revision map, it skips that commit without putting it into the mapping.

The convert method calls the toposort method with this mapping to put them in order (see the section describing [#head-9a4a58a2678d6330d6180514c475e261c5a19823 commit ordering]). toposort takes the mapping, iterates the keys (which are revision identifiers from the source repository), builds a new list containing them in the sorted order, and returns that list.

Now it's time to begin copying commits.


CategoryInternals