Size: 3257
Comment: wrote about changegroups
|
← Revision 34 as of 2018-02-10 00:05:58 ⇥
Size: 2056
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
This page describes the second iteration of the bundle format, tentatively called HG19 (since we hope to include it with Mercurial 1.9). | {{{#!wiki caution |
Line 3: | Line 3: |
Bundles consist of the following sections: * A bundle header, describing the version and features present in the data. * A changegroups section containing the changelog, the manifest and each relevant filelog. * Optionally, a footer containing an index for more efficient random access? |
This information was derived by reverse engineering. Some details may be incomplete. Hopefully someone with intimate familiarity with the code can improve it.}}} |
Line 8: | Line 5: |
<<TableOfContents>> | The v2 bundle file format is in practice quite similar to v1 (see BundleFormat), in that it comprises a file header followed by a changegroup, but it differs in a few significant ways. |
Line 10: | Line 7: |
== Sections == For each section, the offsets are given relative to the beginning of the section. Fields with unknown length are assigned constants a, b, c etc. |
== Practical differences from v1 bundles == * The file has a more verbose multi-stage ASCII header containing key:value pairs. (more below) * Zstandard compression (new default) also supported. * Uses version 2 deltagroup headers instead of version 1. (see the spec at [[Topic:internals.changegroups|help internals.changegroups]]) * Everything after the header is shredded into N-byte chunks after it is assembled (N is a parameter defined in the source code). |
Line 13: | Line 13: |
=== Header === The bundle header has the following format: |
== Reading the header == |
Line 16: | Line 15: |
|| '''Offset''' || '''Size''' || '''Type''' || '''Description''' || ||<)> 0 ||<)> 4 || string || Bundle format version. Always contains "HG19". || ||<)> 4 ||<)> 2 || string || Compression type. Either "BZ", "GZ" or "UN". || ||<)> 6 ||<)> 4 || uint || Length of feature string, in bytes. || ||<)> 10 ||<)> a || string || Bundle features (or requirements). A list of newline separated strings describing features present in the bundle (unterminated). || |
=== stage 1 === || 'HG20' || Compression Chunk || rest of file || |
Line 22: | Line 18: |
=== Changegroups section === The changegroups section has the following format: |
Compression Chunk will be either null or contain the ASCII 'Compression=XX' where XX is a code indicating which decompression to use on the rest of the file. |
Line 25: | Line 20: |
|| '''Offset''' || '''Size''' || '''Type''' || '''Description''' || ||<)> 0 ||<)> 4 || uint || Number of changelog entries. || ||<)> 4 ||<)> b || group || Changegroup containing changelog entries. || ||<)> b + 4 ||<)> 4 || uint || Number of manifest entries. || ||<)> b + 8 ||<)> c || group || Changegroup containing manifest entries. || ||<)> b + c + 8 ||<)> 4 || uint || Number of filelog changegroups (note: not the number of entries). || |
=== stage 2 === |||| rest of file from stage 1 || || Parameters Chunk || shredded changegroup (and possibly other sections?) || |
Line 32: | Line 24: |
Then, for each filelog, the following: | Parameters Chunk contains (among possibly other things?) the fact that the file contains a changegroup ('\x0bCHANGEGROUP'), a null chunk, and then a complex nested sequence of two parameter categories. The nested sequence contains, first, indicators for how many key:value pairs are in the first category, followed by how many pairs are in the second category, followed by the length of an ASCII key, followed by the length of its ASCII value (repeated for all keys and values). |
Line 34: | Line 26: |
|| '''Offset''' || '''Size''' || '''Type''' || '''Description''' || ||<)> 0 ||<)> 4 || uint || Number of filelog entries. || ||<)> 4 ||<)> 4 || uint || Length of filename, in bytes. || ||<)> 8 ||<)> d || string || Filename (unterminated). || ||<)> d + 8 ||<)> e || group || Changroup containing filelog entries. || The changegroup format is described below. == Changegroups == A changegroup consists of a number of chunks describing revisions. Each chunk has the following format: || '''Offset''' || '''Size''' || '''Type''' || '''Description''' || ||<)> 0 ||<)> 4 || uint || Total length of the chunk, including the 104 bytes header described here. || ||<)> 4 ||<)> 20 || sha-1 hash || Node of this revision. || ||<)> 24 ||<)> 20 || sha-1 hash || First parent of this revision. || ||<)> 44 ||<)> 20 || sha-1 hash || Second parent of this revision (or 0-bytes). || ||<)> 64 ||<)> 20 || sha-1 hash || Link pointer back to the changelog. || ||<)> 84 ||<)> 20 || sha-1 hash || Parent for the delta (or 0-bytes for a snapshot). || ||<)> 104 ||<)> f || data || Delta or full version snapshot. || So in the above table, we always have ''chunk length = f + 104.'' |
Example Parameters Chunk: || chunk length |||| description of contents || #section1 parameters || #section2 parameters || len(key1),len(value1) || len(key2),len(value2) || key1 || value1 || key2 || value2|| || 4 bytes || \x0bCHANGEGROUP || 4 bytes null || \x01 || \x01 || \x07\x02 || \t\x01 || version || 02 || nbchanges || 7 || |
This information was derived by reverse engineering. Some details may be incomplete. Hopefully someone with intimate familiarity with the code can improve it.
The v2 bundle file format is in practice quite similar to v1 (see BundleFormat), in that it comprises a file header followed by a changegroup, but it differs in a few significant ways.
Practical differences from v1 bundles
- The file has a more verbose multi-stage ASCII header containing key:value pairs. (more below)
- Zstandard compression (new default) also supported.
Uses version 2 deltagroup headers instead of version 1. (see the spec at help internals.changegroups)
- Everything after the header is shredded into N-byte chunks after it is assembled (N is a parameter defined in the source code).
Reading the header
stage 1
'HG20' |
Compression Chunk |
rest of file |
Compression Chunk will be either null or contain the ASCII 'Compression=XX' where XX is a code indicating which decompression to use on the rest of the file.
stage 2
rest of file from stage 1 |
|
Parameters Chunk |
shredded changegroup (and possibly other sections?) |
Parameters Chunk contains (among possibly other things?) the fact that the file contains a changegroup ('\x0bCHANGEGROUP'), a null chunk, and then a complex nested sequence of two parameter categories. The nested sequence contains, first, indicators for how many key:value pairs are in the first category, followed by how many pairs are in the second category, followed by the length of an ASCII key, followed by the length of its ASCII value (repeated for all keys and values).
Example Parameters Chunk:
chunk length |
description of contents |
#section1 parameters |
#section2 parameters |
len(key1),len(value1) |
len(key2),len(value2) |
key1 |
value1 |
key2 |
value2 |
|
4 bytes |
\x0bCHANGEGROUP |
4 bytes null |
\x01 |
\x01 |
\x07\x02 |
\t\x01 |
version |
02 |
nbchanges |
7 |