Convert extension
This extension is distributed with Mercurial.
Author: several people
Implementation information can be found here: ConvertExtensionImplementation
1. Overview
The Convert extension converts repositories from other SCMs (or even Mercurial itself) into Mercurial or (with limits) Subversion repositories, with options for filtering and renaming. It can also be used to filter Mercurial repositories to get subsets of an existing one.
The current release supports the following repository types as sources:
- CVS
- Subversion
- Git
- Darcs
- Monotone
- Bazaar
- GNU Arch
- Mercurial
- Perforce
2. Configuration
Add the following lines to your .hgrc or to enable the extension :
[extensions] hgext.convert=
3. Usage
hg convert [OPTION]... SOURCE [DEST [REVMAP]]
SOURCE points to the data to be imported. It can be:
- the name of the (local) directory containing the code checked out from remote repository (for example, name of the directory checked out from CVS)
- (in some cases) address of the remote repository (for example, Subversion repository URL). See below for information for which systems such syntax is possible.
DEST is a local directory name where the conversion data will go to (Mercurial repository will be created or updated). If DEST is not provided, it will be created by adding -hg suffix to the source directory name. For example if source is in /my/cvs/dir, default destination is /my/cvs/dir-hg. When URLs are supplied, repository name is inferred from the last path component, http://foo.bar/repo/trunk gives trunk-hg.
REVMAP is a a simple text file that maps each source commit ID to the destination ID for each revision. Unless specified, it defaults to the .hg/shamap in the destination directory. This file is automatically created and updated on each commit copied, its purpose is to track which commits were already imported and which were not - and thanks to it allow to resume interrupted import and to make incremental updates. It is important to note that this REVMAP file is not copied when you a clone a repository. So you need to manually move it over if you are going to make incremental updates on a clone of the original import repository.
When converting to Mercurial, the destination working directory is used as a temporary storage for file revisions but is not updated. hg status lists all these temporary files as unknown. Purge them and update to get a correct view of the converted repository.
4. Options
4.1. --dest-type
Select the destination repository type. Mercurial by default, optionally svn for Subversion. Beware, only half-working: history in branches is seemingly randomly lost (some of the changesets in branches do appear in the resulting repository; tags are lost). Check --hg help convert.
4.2. --rev
Convert can optionally stop importing at a given revision, if the --rev option is provided. The argument should be given in terms the source understands (e.g. a revision number for Subversion sources, or a hash for git sources). Revisions newer than specified by this parameter are not imported.
This can also be useful to do incremental conversions. Incremental conversions may be useful not only when tracking newer changes in the source repo but also in very huge repos which would need huge resources for a whole conversion and can be better handled with an incremental one.
4.3. --authors / --authormap
Convert can also remap author names during conversion, if the --authormap option is provided. The argument should be a simple text file that maps each source commit author to a destination commit author. It is handy for source SCMs that use UNIX logins to identify authors (e.g., CVS). Example:
john=John Smith <John.Smith@someplace.net> tom=Tom Johnson <Tom.Johnson@bigcity.com>
Note: It is recommended to use the --authormap option instead of --authors as it is now deprecated.
4.4. --filemap
Convert can also filter or rename files during conversion, when you supply it a mapping via the --filemap option.
The filemap is a file that specifies which files are to be included, renamed, or omitted. By default all files are included (empty filemap means include everything).
Each line can contain one of the following directives:
include path/to/file
exclude path/to/file
rename from/file to/file
All paths should be specified as relative paths rooted in the converted directory. Unix path syntax should be used, regardless of OS.
The include directive causes a file, or all files under a directory, to be included in the destination repository, and the exclusion of any other element that's not under an inclusion rule.
The exclude directive causes files or directories to be omitted.
The rename directive renames a file or directory. To rename from a subdirectory into the root of the repository, use . as the path to rename to.
It is also possible to use comments, comment lines start with #.
Entries are parsed like Unix shell commands and should be quoted accordingly. This means that you must quote filenames containing spaces -- it does not mean that you can use glob patterns like *.so.
For example, to import all files except doc subdirectory, but include doc/foo bar.txt and include doc/FAQ renaming it to faq, use:
# Documentation is to be converted to separate repository. include "." exclude "doc" include "doc/foo bar.txt" include "doc/FAQ" rename "doc/FAQ" "faq"
4.5. --splicemap
This one is intended to specify new parents on specific revisions. Its main use is to put a (converted) history at the end of another one by specifying the last revision of the existing history as being the parent of the new one. In that case, the splicemap file will have one line being:
first-revision-of-to-be-converted-history tip-of-existing-history
Other usages of this file includes splicing one series of commits in between two other commits, removing commits from the history (by connecting their antecedent and descendant directly together), or forging a merge (by adding a second parent to a commit).
The format of revisions in the splicemap is identical to the one used in the file .hg/shamap, that is created when converting a repository. For Mercurial revisions you must use the full 40-digit hexadecimal hash to specify a revision and for Subversion revision the format is:
svn:guid-of-repository/path/to/files/or/dir@REV
You can obtain the the UUID of a Subversion repository by running svn info.
The file format of the splice map is simple: each splice is one line. The first revision identifier on the line is the child cset whose parents you are editing, and must be specified relative to the source repository. After a single space, the second revision identifier is the first parent being set. To set a second parent, separate the two revision identifiers with a comma. The second and optional third are from either the source or destination repository. Note that blank lines in the splicemap file will cause a parse error.
4.6. --branchmap
Since v1.3
The branchmap is a file that allows you to rename a branch when it is being brought in from whatever external repository. When used in conjunction with a splicemap, it allows for a powerful combination to help fix even the most badly mismanaged repositories and turn them into nicely structured Mercurial repositories. The branchmap contains lines of the form
original_branch_name new_branch_name
original_branch_name is the name of the branch in the source repository, and new_branch_name is the name of the branch is the destination repository. This can be used to move code in one repository from "default" to a named branch.
Remember that "default" identifies the default branch of Mercurial repositories, which is not displayed by default by log, heads or parents commands. To erase named branch markers, convert it to "default" with a branchmap like
original_branch_name default
The revisions will still be there but no longer attached to the original named branch.
4.7. --branchsort, --datesort, --sourcesort
Normally convert will import revisions in an order that produces the fewest jumps between branches in the commit log. If you want to make the revision order in the destination more closely match that of the parent, use the --datesort flag. Note that this option might well increase the size of the destination repo by 10-20 times. You have been warned.
- --branchsort
- convert from parent to child revision when possible, which means branches are usually converted one after the other. It generates more compact repositories.
- --datesort
sort revisions by date. Converted repositories have good-looking changelogs but are often an order of magnitude larger than the same ones generated by --branchsort.
- --sourcesort
- try to preserve source revisions order, only supported by Mercurial sources.
5. Repository Conversion
5.1. Converting from Bazaar
5.1.1. Ubuntu Linux
Use Mercurial v1.5 or later, from ppa if necessary.
# install mercurial 1.6 on ubuntu 10 sudo add-apt-repository ppa:mercurial-ppa/releases sudo apt-get update sudo apt-get install mercurial # the actual conversion hg convert path/to/foo-bzr-branch foo-hg cd foo-hg hg update
The convert extension uses the Bazaar branch nick as the name for the branch in the hg repository. Bazaar by default uses the repository directory name as the nick. This means the branch in Mercurial is probably not going to be named "default." To make the Mercurial branch name be "default" use the --branchmap FILE option to convert.
In the FILE create a mapping:
bzr_nick default
where bzr_nick is the name returned by running bzr nick on the Bazaar repository.
5.1.2. Windows
Prerequisites:
Python binaries for Windows (see http://www.python.org/download/)
Mercurial as Python module (see Download)
Bazaar as Python module (see http://wiki.bazaar.canonical.com/WindowsDownloads)
Python for Windows extension (see http://sourceforge.net/projects/pywin32/files/pywin32/)
The .msi installers don't include the necessary Python libraries. Also, if you have installed TortoiseHg, you will probably need to install Python. You will also need to install Bazaar modules (see http://wiki.bazaar.canonical.com/WindowsInstall for instructions on how to install Bazaar).
To ensure you are running the Python module version of Mercurial, you must call the Mercurial script directly. Otherwise you might need to use the executable file (hg.exe) from TortoiseHg.
C:\python26\scripts\hg convert -s bzr path/to/foo-bzr-branch foo-hg cd foo-hg hg update
When finished, the installed Python tools can be cleanly removed using the normal Add/Remove Programs method.
5.1.3. Other
You may also try a simpler method which uses the Bazaar version already installed on your system.
5.2. Converting from CVS
Prerequisites:
- commandline CVS client
cvsps (not needed as of Mercurial v1.3 or later)
Note: on Windows, cvsps currently requires Cygwin. Work is underway to make cvsps work without cygwin - see http://repo.or.cz/w/cvsps/4msysgit.git/.
The convert extension requires using CVS "working directory" since it uses cvsps (it is not possible to convert using direct CVS repository URL).
Example conversion:
# Normal CVS checkout (existing checkout may also be used) cvs -d :pserver:user@repository.host:/repo/path checkout somemodule # Actual conversion hg convert somemodule # New Mercurial repo is created in somemodule-hg directory
If you prefer another destination, just specify it, for example:
hg convert somemodule /some/where/somemodule.hg
It is possible to customize the way cvsps is called (program name, args etc). Enter the following in hgrc:
[convert] # This is default cvsps = cvsps -A -u --cvs-direct -q
The -A option is necessary, one can play with the rest. Possible ideas:
remove -q to see cvsps warnings and to track its progress (do it if conversion fails or hangs!)
remove --cvs-direct if you seem to have problems with communication with CVS server (without --cvs-direct cvsps will use your cvs client instead of builtin CVS client code)
add -Z 1 (or similar) option to enable compression
- pipe cvsps output through some script (say, log messages encoding conversion)
It is strongly recommended to use a fresh, newly made CVS checkout for the conversion. Problems were reported when people tried converting old checkouts made with old CVS versions. An example is an error like
cvs server: cvs [checkout aborted]: Absolute module reference invalid: `/path/to/some_module/subdirectory/file_name'
Incremental conversion works, too. Just repeat the
hg convert somemodule /some/where/somemodule.hg
command to grab new updates from CVS. It is not necessary to cvs update in the somemodule directory. It may be necessary if new directories were created - to be tested.
5.2.1. CVS keyword expansion
By default, the convert extension does not expand CVS keywords. If you want to import files with CVS keywords expanded, go to the Mercurial source code and change options passed to a CVS command, in particular -kk to -kkv (file hgext/convert/cvs.py, around line 255, search for string "-N -P -kk -r %s --").
5.2.2. Other tools
See RepositoryConversion#CVS for other conversion tools that can be used.
5.3. Converting from Darcs
The converter does not currently handle patch conflicts very well. For more detail, see the notes in tests/test-convert-darcs. If you encounter this problem, try using the deprecated contrib/darcs2hg.py script instead.
5.4. Converting from Perforce
Prerequisites:
- Perforce client
Steps to migrate an existing Perforce depot to a new Mercurial repository and push it to a remote instance
- Create a workspace in Perforce that contains the files/folders that you want to convert to Mercurial
- Sync to the new workspace (the files have to be present for the conversion)
- Run hg convert to migrate the files and history
- Update the local Mercurial repository to get the files into your workspace
- Clone the new local repository to your remote Mercurial instance
Sample bash script:
export P4PORT=your_p4_server:1234 # NOTE client and username could be the same, password may be optional export P4CLIENT=your_client_spec_name export P4USER=username export P4PASSWD=password mkdir p4depot cd p4depot # Example 1 p4 sync -f ... # Example 2 - NOTE "..." is significant! p4 client p4 sync -f //sample_app/... mkdir ../hgdepot cd ../hgdepot hg convert -s p4 //sample_app/... new_hg_repo cd new_hg_repo hg update hg clone . ssh://account@your_remote_server.com/new_hg_repo
5.5. Converting from RCS
(kudos to the author of http://utcc.utoronto.ca/~cks/space/blog/sysadmin/RCStoMercurial)
First convert to a dummy CVS repo, then convert from a CVS checkout as shown below or as in ConvertExtension#Converting_from_CVS .
Converting RCS to CVS:
$ mkdir /tmp/cvs-repo; $ cvs -d /tmp/cvs-repo init $ ls -F /tmp/cvs-repo CVSROOT/ $ rsync -a ~/my-scripts/RCS/ /tmp/cvs-repo/my-scripts $ ls -F /tmp/cvs-repo CVSROOT/ my-scripts/ $ cvs -d /tmp/cvs-repo co -d /tmp/my-scripts my-scripts cvs checkout: updating my-scripts U /tmp/my-scripts/foo.sh U /tmp/my-scripts/bar.sh $
A simplified conversion to hg (version 1.5.1):
$ mkdir ~/my-repo.hg $ hg convert --datesort /tmp/cvs-checkout/my-scripts ~/my-repo.hg $ cd ~/my-repo.hg $ ls -aF ./ ../ .hg/ $ hg update 2 files updated, 0 files merged, 0 files removed, 0 files unresolved $ ls -aF ./ ../ .hg/ foo.sh bar.sh
Notes:
run hg convert with the --datesort option - all RCS changesets will be in strict date order
to incrementally add other RCS collections, convert each to a new CVS module in /tmp/cvs-repo, check out and run convert against my-repo.hg
5.6. Converting from Subversion
See WorkingWithSubversion for examples of Mercurial over Subversion workflows.
Prerequisites:
Subversion's Python bindings (see http://subversion.tigris.org/)
Bindings are included with the TortoiseHg distribution package and the Win32 InnoSetup (non-MSI) binaries for Mercurial, so if you install either of them you don't need to install additional packages.
You may need to use a Mercurial installed on top of a stand-alone Python, and you may also need to do something like
set HG=python c:\Python25\Scripts\hg
to override the default Win32 binaries if you have those installed also. For Mac OS X, the easiest way is to install the CollabNet Subversion build, and then copy the content of /opt/subversion/lib/svn-python to the site-package directory of the Python installation.
The source may be a URL or a path to a subversion repository or working directory.
5.6.1. Options
The converter supports a few Subversion-specific options:
- svn.trunk
- Relative path to the trunk (default: "trunk")
- svn.branches
- Relative path to tree of branches (default: "branches")
- svn.tags
- Relative path to tree of tags (default: "tags")
- svn.startrev
- Start subversion revision. Work for single branches conversions only.
Set these under [convert] in a .hgrc, or on the command line as follows:
hg --config convert.svn.trunk=wackoname convert [...]
A child Mercurial process is spawned to fetch subversion revision information. The heuristic used to find the Mercurial executable is simple but good enough for basic setups. If you are running the convert extension from a Mercurial installation which is not the primary one (not in $PATH), you should set the $HG environment variable to the location of your Mercurial executable or script, so the child process will be spawned with the expected Mercurial version.
5.6.2. More about Subversion URL and Paths Handling
Let's see what we can do to convert http://code.sixapart.com/svn/memcached. trunk can be converted directly with:
hg convert http://code.sixapart.com/svn/memcached/trunk
Here, convert retrieves all revisions down to the first one created in trunk or coming from a copy living somewhere else. What's important to note is *not all revisions* existing in trunk may be converted: if trunk is overwritten at revision A by a branch coming from another part of the repository, the conversion will stop at revision A. http://code.sixapart.com/svn/memcached/trunk is the root URL, only revisions touching files in its subtree will be converted.
Things are different if we run:
hg convert http://code.sixapart.com/svn/memcached
The root URL is http://code.sixapart.com/svn/memcached. The convert extension tries to detect canonical trunk/branches/tags layout automatically. Here, trunk will be detected and added to the modules to convert. It will also detect the branches subdirectory and add its children directories to conversion targets. Starting at these heads, module history is unrolled like when converting trunk, only this time branching information is detected. For instance, if the branches/memcached-1.1.x module is branched from trunk at revision 255, both will be related in the converted repository. Note the do not follow history below root URL still applies but is usually irrelevant with well behaving trunk/branches/tags layouts. Branches are named after their module name, like memcached-1.1.x and this may be preserved by the destination backend (Mercurial backend defaulting to convert.hg.branchnames=1, named branches are created after these module names). Finally, the tags directory is detected automatically, branching points computed and used to tag converted revisions.
If your repository layout differs from the canonical trunk/branches/tags, these can be redefined with convert.svn.trunk, convert.svn.branches and convert.svn.tags configuration options. Values are relative paths from the root URL, like archive/2006/memcached-old for http://code.sixapart.com/svn/memcached/archive/2006/memcached-old.
Local repositories must be converted using local URLs. Under Unix, a local memcached repository would be converted with:
$ hg convert file://`pwd`/memcached-repo/memcached
Under Windows (assuming memcached-repo in c:\dev), it would be:
$ hg convert file:///c:/dev/memcached-repo/memcached
The extension works from checkouts:
$ hg convert memcached
5.6.3. Working around Network and Bindings Issues
Sometimes, conversion of remote repositories is complicated by poor network connectivity, remote server misconfiguration, SVN bindings issues or bugs in the converter when handling remote sources. The converter fetches whole files and not just deltas, so it is not very efficient. Often, you would like to have the SVN repository locally to play and replay the conversion with different parameters or track bugs more quickly. One solution is to mirror it locally using the svnsync script (docs) provided with SVN distributions.
For example:
$ svnadmin create foomirror $ echo '#!/bin/sh' > foomirror/hooks/pre-revprop-change # make insecure dummy hook. On windows this can be an empty file with the .bat suffix $ chmod +x foomirror/hooks/pre-revprop-change $ svnsync init --source-username <username> file://`pwd`/foomirror https://foo.svn.sourceforge.net/svnroot/foo Copied properties for revision 0. $ svnsync sync file://`pwd`/foomirror Committed revision 1. Copied properties for revision 1. Committed revision 2. Copied properties for revision 2. ... $ svn co file:///`pwd`/foomirror foomirror-svn # - if you want to make a checkout from your repo mirror $ hg convert foomirror # convert directly from repo mirror to foomirror-hg ...
5.6.4. Other tools
See RepositoryConversion#Subversion for other methods that can be used.
5.7. Converting from Mercurial
It's also useful to filter Mercurial repositories to get subsets of an existing one. For example to transform a subdirectory subfoo of a repository foo into a repository with its own life (while keeping its full history), do the following:
$ echo include subfoo > /tmp/myfilemap $ echo rename subfoo . >> /tmp/myfilemap $ hg convert --filemap /tmp/myfilemap /path/to/repo/foo /tmp/mysubfoo-repo
hg.ignoreerrors also let missing revlog errors to be ignored when reading the source, which may be useful to fix a repository with a damaged store. Set this option to True and convert from Mercurial to Mercurial.
Other configuration options supported:
- hg.saverev
- Store the original revision ID in changeset (forces target IDs to change). It takes a boolean argument and defaults to False.
- hg.startrev
- Convert the specified starting revision and its descendants. It takes a Mercurial revision identifier and defaults to 0.
5.8. Converting to Mercurial
The Mercurial back end supports the following configuration options:
- hg.usebranchnames
Use named branches to record source branches (default: true)
- hg.clonebranches
Create branches as clones of their parent branches. All branches will be created as subdirectories of the convert destination. (default: false)
- hg.tagsbranch
Put tags on the given branch (default: default)
An example of using these settings in a .hgrc file is:
[convert] hg.usebranchnames=0 hg.clonebranches=1
As thorough as the convert extension is, it cannot handle all cases, especially if those cases involve the conversion of a very non-standard repository layout. Please see ProblematicConversions for possible solutions.
6. Customization
For custom conversion needs, it is quite simple to customize the conversion proces. A simple custom extension makes it possible to hook into the convert process with your own Python code and a bit of boiler plate code. The following example showcases some use cases. Save it as /some/where.py and add customsource = /some/where.py to your [extensions] section.
Not all code paths have been tested after editing for py2 -> p3 conversion.
import hgext.convert.convcmd from hgext.convert.hg import mercurial_source as basesource #from hgext.convert.subversion import svn_source as basesource # to use subversion as source class customsource(basesource): def getchanges(self, version, full): # returns (filename, rev) list and target, source dictionary # files not included in the list is just ignored files, copies, cleanp2 = super(customsource, self).getchanges(version, full) return files, copies, cleanp2 def getcommit(self, rev): # returns meta data for changeset rev c = super(customsource, self).getcommit(rev) c.extra = c.extra.copy() # use case: authormapsuffix c.author = c.author.split(b'@', 1)[0] + b'@FreeBSD.org' # use case: closemap if rev in b''' d643f67092ff123f6a192d52f12e7d123dae229f 9117c6561b0bd7792fa13b50d28239d51b78e51f ''': c.extra[b'close'] = b'1' # use case: branchmap if c.branch == b'default': c.branch = b'trunk' # use case: modify descriptions c.desc = c.desc.title() # use case: modify time c.date = b'1971-02-03 04:05:06 +0000' return c def getfile(self, name, rev): # returns file content and flags for named file at revision data, flags = super(customsource, self).getfile(name, rev) if data is None or b'nuclear launch code' in data: # the change is that the file is removed return None, None # raise IOError # for Mercurial < 3.2 # use case: modify file data if rev == b'f8b4ca9f6ffeb6b288be4c44e4d55458f867cd7c' and name == b'foo': data = b'yo\nyo\n' # use case: tagmap if name == b'.hgtags': data = b''.join(line[:41] + tagmap(line[41:]) + b'\n' for line in data.splitlines()) return data, flags # use case: tagmap def gettags(self): return dict((tagmap(tag), node) for tag, node in super(customsource, self).gettags().items()) # use case: tagmap def tagmap(tag): return tag.replace(b'some', b'other') hgext.convert.convcmd.source_converters.append((b'customsource', customsource, b'branchsort'))
Usage:
$ hg convert --source-type customsource sourcerepo destrepo
In the example, customsource extends mercurial_source, so the source-type is expected to be a mercurial repository; see hgext.convert.convcmd.source_converters for a list of other types to inherit.
7. See also
RepositoryConversion for other tips and VCS systems
Talk page