Note:
This page is no longer relevant but is kept for historical purposes.
Manifest v2 Plan
Support for manifest v2 has been removed in Mercurial 4.6 (released May 2018), as the format failed to meet expectations.
Use of the tree manifests described in TreeManifestPlan will result in a new manifest hash, so now is good time to introduce a new manifest format. Much of the below comes from ImprovingManifestCompressionPlan.
Format
Header
To identify the manifest as being version 2, the first line will start with a null byte (an empty path, which is disallowed in v1). What follows it is a header:
\0{metadata}\n
The metadata field stores key/value pairs, with each pair separated by a null byte. The value is separated from the key by a colon (:).
For example:
\0treemanifest:\n
File entries
The current format is:
{path}\0{40-byte hex nodeid}{flags}\n
The proposal is to change it to:
{path}\0{flags[\0{metadata}]}\n{20-byte binary nodeid}\n
The hash is binary, saving 20 bytes per line, and is on a separate line to make deltas smaller.
If the flags field ends with a null byte, what follows it (until end of line) is metadata. The format is the same as the header metadata. There are no current plans for what will be stored in this field.
With stem compression, we would simply replace the first byte of the path with the number of bytes to copy from the previous path.
Space savings (or NOT)
Storage sizes (post gzip compression) in bytes when run on the first revisions of the Mozilla repo:
|
v1 |
v2 without stem compression |
v2 with stem compression |
Full revision (rev 1 of the repo) |
769307 |
674620 (-12% vs v1) |
634897 (-17% vs v1, -6% vs no stem compression) |
Next 4999 revisions |
1583141 |
1000899 (-37% vs v1) |
974380 (-38% vs v1, -3% vs no stem compression) |
First 5000 revisions |
2352448 |
1675519 (-29% vs v1) |
1609277 (-32% vs v1, -4% vs no stem compression) |
HOWEVER, when run on the entire history of mozilla-unified, the space usage increases. With 336202 in the repo, 00manifest.d went from 163M to 277M. hg debugrevlog -m output follows. As you can see, uncompressed size goes down to about 40%, but average chain length also goes down to about 20%, meaning we emit more full manifests. Compression ratio (I'm not sure how that's measured) also goes down to about 20%.
Manifest v1:
format : 1 flags : generaldelta revisions : 335351 merges : 15769 ( 4.70%) normal : 319582 (95.30%) revisions : 335351 full : 202 ( 0.06%) deltas : 335149 (99.94%) revision size : 170718580 full : 22434375 (13.14%) deltas : 148284205 (86.86%) avg chain length : 14234 max chain length : 36788 compression ratio : 15829 uncompressed data size (min/max/avg) : 51 / 14689628 / 8058636 full revision size (min/max/avg) : 52 / 3783061 / 111061 delta size (min/max/avg) : 0 / 1585383 / 442 deltas against prev : 252069 (75.21%) where prev = p1 : 250597 (99.42%) where prev = p2 : 1393 ( 0.55%) other : 79 ( 0.03%) deltas against p1 : 76206 (22.74%) deltas against p2 : 6874 ( 2.05%) deltas against other : 0 ( 0.00%)
Manifest v2:
format : 1 flags : generaldelta revisions : 335195 merges : 15733 ( 4.69%) normal : 319462 (95.31%) revisions : 335195 full : 227 ( 0.07%) deltas : 334968 (99.93%) revision size : 290313187 full : 56582578 (19.49%) deltas : 233730609 (80.51%) avg chain length : 2381 max chain length : 11123 compression ratio : 3524 uncompressed data size (min/max/avg) : 35 / 5337933 / 3052630 full revision size (min/max/avg) : 35 / 3105270 / 249262 delta size (min/max/avg) : 0 / 1598061 / 697 deltas against prev : 257137 (76.76%) where prev = p1 : 252886 (98.35%) where prev = p2 : 4102 ( 1.60%) other : 149 ( 0.06%) deltas against p1 : 74190 (22.15%) deltas against p2 : 3641 ( 1.09%) deltas against other : 0 ( 0.00%)
Backwards compatibility
When the user has set the config (experimental.manifestv2 for now), any new commit will be written using the new format, and we'll add an entry to requires at that point. Cloning from a v1 repo results in a v1 repo. Cloning from a v2 repo results in a v2 repo (meaning requires contains manifestv2). Pulling from a v2 repo into a v1 repo will be allowed only if the experimental.manifestv2 config is set. Similarly, pushing from a v2 repo into a v1 repo will be allowed only if the config is set on the destination repo.
TBD: Will we convert to old format on the fly for exchange?
What would having "manifestv2" in the requires mean? Does it imply that the repo definitely has mv2 commits, or just that it might have them?
- What do we do if the local repo is mv1, and pulls an unrelated repo that's mv2? do we add mv2 to the requires file automatically, throw an error, ask the user, etc?
- What do we do if the remote repo is mv1, but we clone and upgrade ours to mv2, and try to push? do we fail, or do we upgrade the server to mv2? do we have a server-side setting for whether its repo is "upgradable"?
Readdelta
In a few places, we use manifest deltas without resolving the entire manifest. One example is hg verify, which slows down a lot when not taking advantage of reading deltas (a naive test shows >5x on the Mozilla repo). Since the new format splits up file entries on two lines, deltas for modifications will not include the file path, which means it will not be useful. Therefore, we will not be reading delta for v2 manifests. However, the problem is that it's not obvious whetera read delta is in the old or the new format. There seems to be a few options here:
Never read deltas when using the new format (as indicated by the requires entry) and accept that some operations will be slower.
- Look closer at the delta content. It seems like a delta of the old format could not be mistaken for a delta of the new format, or vice versa.
- Modify the format so it's trivial to determine whether it's a v1 or v2 manifest by reading a delta. This can be done by prepending every line by a null byte (empty path).
Note that the latter two options involve reading the delta only to have to read the full content anyway if the delta is for the new format.