Note:
This page is primarily intended for developers of Mercurial.
Note:
This page is no longer relevant but is kept for historical purposes.
This page describes the second iteration of the bundle format, tentatively called HG19 (since we hoped to include it with Mercurial 1.9).
Bundles consist of the following sections:
- A bundle header, describing the version and features present in the data.
- A changegroups section containing the changelog, the manifest and each relevant filelog.
- Optionally, a footer containing an index for more efficient random access?
Nomenclature
For the sake of this document, the following (otherwise often quite ambigious) terms are used:
bundle
- Data in the format described in this document, including all headers. Could also be called full bundles.
headerless bundle
- A bundle without the first 6 bytes of the header, containing the version identifier and compression type. These have traditionally been used internally and in the wire protocol, and are always uncompressed.
chunk
- Data corresponding to a single revlog entry.
changegroup
- A list of chunks containing revlog entries. Sometimes called chunkgroup.
Sections
For each section, the offsets are given relative to the beginning of the section. Fields with unknown length are assigned constants a, b, c etc.
Header
The format of the bundle header is described below. Traditionally, the first part of the header (only part in the existing format), is often left out in internal processing and over the wire. This part consists of the first 6 bytes up to and including the compression type. In such cases, the bundles are always considered to be uncompressed. It has not been decided what we will do with the new bundle format.
Offset |
Size |
Type |
Description |
0 |
4 |
string |
Bundle format version. Always contains "HG19". |
4 |
2 |
string |
Compression type. Either "BZ", "GZ" or "UN". |
6 |
4 |
uint |
Length of feature string, in bytes. |
10 |
a |
string |
Bundle features (or requirements). A list of newline separated strings describing features present in the bundle (unterminated). |
Changegroups section
The changegroups section has the following format:
Offset |
Size |
Type |
Description |
0 |
4 |
uint |
Number of changelog entries. |
4 |
b |
group |
Changegroup containing changelog entries. |
b + 4 |
4 |
uint |
Number of manifest entries. |
b + 8 |
c |
group |
Changegroup containing manifest entries. |
b + c + 8 |
4 |
uint |
Number of filelog changegroups (note: not the number of entries). |
Then, for each filelog, the following:
Offset |
Size |
Type |
Description |
0 |
4 |
uint |
Number of filelog entries. |
4 |
4 |
uint |
Length of filename, in bytes. |
8 |
d |
string |
Filename (unterminated). |
d + 8 |
e |
group |
Changroup containing filelog entries. |
The changegroup format is described below.
Changegroups
A changegroup consists of a number of chunks describing revisions. Each chunk has the following format:
Offset |
Size |
Type |
Description |
0 |
4 |
uint |
Total length of the chunk, including the 104 bytes header described here. |
4 |
20 |
sha-1 hash |
Node of this revision. |
24 |
20 |
sha-1 hash |
First parent of this revision. |
44 |
20 |
sha-1 hash |
Second parent of this revision (or 0-bytes). |
64 |
20 |
sha-1 hash |
Link pointer back to the changelog. |
84 |
20 |
sha-1 hash |
Parent for the delta (or 0-bytes for a snapshot). |
104 |
f |
data |
Delta or full version snapshot. |
So in the above table, we always have chunk length = f + 104.
Further requirement
Additional feature have landed into Mercurial since this design. We also wish to support the following data in a bundle
light weight copy support (http://bz.selenic.com/show_bug.cgi?id=883)
Phases data
ChangesetsObsolescence marker
- Bookmark updates