Size: 3310
Comment:
|
Size: 3841
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 2: | Line 2: |
Use of the tree manifests described in [[ManifestShardingPlan]] will result in a new manifest hash, so now is good time to introduce a new manifest format. Much of the below comes from [[ImprovingManifestCompressionPlan]]. |
Use of the tree manifests described in ManifestShardingPlan will result in a new manifest hash, so now is good time to introduce a new manifest format. Much of the below comes from ImprovingManifestCompressionPlan. |
Line 6: | Line 5: |
Line 8: | Line 6: |
Line 14: | Line 11: |
Line 22: | Line 18: |
=== File entries === The current format is: |
|
Line 23: | Line 21: |
=== File entries === The current format is: |
|
Line 29: | Line 24: |
The proposal is to change it to: | |
Line 30: | Line 26: |
The proposal is to change it to: | |
Line 34: | Line 29: |
Line 42: | Line 36: |
Storage sizes (post gzip compression) in bytes when run on the first revisions of the Mozilla repo: || ||'''v1''' ||'''v2 without stem compression''' ||'''v2 with stem compression''' || ||'''Full revision (rev 1 of the repo)''' ||769307 ||674620 (-12% vs v1) ||634897 (-17% vs v1, -6% vs no stem compression) || ||'''Next 4999 revisions''' ||1583141 ||1000899 (-37% vs v1) ||974380 (-38% vs v1, -3% vs no stem compression) || ||'''First 5000 revisions''' ||2352448 ||1675519 (-29% vs v1) ||1609277 (-32% vs v1, -4% vs no stem compression) || |
|
Line 43: | Line 42: |
Storage sizes (post gzip compression) in bytes when run on the first revisions of the Mozilla repo: | |
Line 45: | Line 43: |
|| || '''v1''' || '''v2 without stem compression''' || '''v2 with stem compression''' || ||'''Full revision (rev 1 of the repo)'''||769307||674620 (-12% vs v1)||634897 (-17% vs v1, -6% vs no stem compression)|| ||'''Next 2899 revisions'''||701755||435228 (-38% vs v1)||421508 (-40% vs v1, -3% vs no stem compression)|| ||'''First 2900 revisions'''||1471062||1109848 (-25% vs v1)||1056405 (-28% vs v1, -5% vs no stem compression)|| |
|
Line 50: | Line 46: |
Line 55: | Line 50: |
* What would having "manifestv2" in the requires ''mean''? Does it imply that the repo definitely has mv2 commits, or just that it might have them? * What do we do if the local repo is mv1, and pulls an unrelated repo that's mv2? do we add mv2 to the requires file automatically, throw an error, ask the user, etc? * What do we do if the remote repo is mv1, but we clone and upgrade ours to mv2, and try to push? do we fail, or do we upgrade the server to mv2? do we have a server-side setting for whether its repo is "upgradable"? |
|
Line 56: | Line 55: |
Manifest v2 Plan
Use of the tree manifests described in ManifestShardingPlan will result in a new manifest hash, so now is good time to introduce a new manifest format. Much of the below comes from ImprovingManifestCompressionPlan.
Format
Header
To identify the manifest as being version 2, the first line will start with a null byte (an empty path, which is disallowed in v1). What follows it is a header:
\0{metadata}\n
The metadata field stores key/value pairs, with each pair separated by a null byte. The value is separated from the key by a colon (:).
For example:
\0treemanifest:\n
File entries
The current format is:
{path}\0{40-byte hex nodeid}{flags}\n
The proposal is to change it to:
{path}\0{flags[\0{metadata}]}\n{20-byte binary nodeid}\n
The hash is binary, saving 20 bytes per line, and is on a separate line to make deltas smaller.
If the flags field ends with a null byte, what follows it (until end of line) is metadata. The format is the same as the header metadata. There are no current plans for what will be stored in this field.
With stem compression, we would simply replace the first byte of the path with the number of bytes to copy from the previous path.
Space savings
Storage sizes (post gzip compression) in bytes when run on the first revisions of the Mozilla repo:
|
v1 |
v2 without stem compression |
v2 with stem compression |
Full revision (rev 1 of the repo) |
769307 |
674620 (-12% vs v1) |
634897 (-17% vs v1, -6% vs no stem compression) |
Next 4999 revisions |
1583141 |
1000899 (-37% vs v1) |
974380 (-38% vs v1, -3% vs no stem compression) |
First 5000 revisions |
2352448 |
1675519 (-29% vs v1) |
1609277 (-32% vs v1, -4% vs no stem compression) |
Backwards compatibility
When the first manifest is written using the new format, we'll add an entry to requires.
TBD: Will we convert to old format on the fly for exchange?
What would having "manifestv2" in the requires mean? Does it imply that the repo definitely has mv2 commits, or just that it might have them?
- What do we do if the local repo is mv1, and pulls an unrelated repo that's mv2? do we add mv2 to the requires file automatically, throw an error, ask the user, etc?
- What do we do if the remote repo is mv1, but we clone and upgrade ours to mv2, and try to push? do we fail, or do we upgrade the server to mv2? do we have a server-side setting for whether its repo is "upgradable"?
Readdelta
In a few places, we use manifest deltas without resolving the entire manifest. One example is hg verify, which slows down a lot when not taking advantage of reading deltas (a naive test shows >5x on the Mozilla repo). Since the new format splits up file entries on two lines, deltas for modifications will not include the file path, which means it will not be useful. Therefore, we will not be reading delta for v2 manifests. However, the problem is that it's not obvious whetera read delta is in the old or the new format. There seems to be a few options here:
Never read deltas when using the new format (as indicated by the requires entry) and accept that some operations will be slower.
- Look closer at the delta content. It seems like a delta of the old format could not be mistaken for a delta of the new format, or vice versa.
- Modify the format so it's trivial to determine whether it's a v1 or v2 manifest by reading a delta. This can be done by prepending every line by a null byte (empty path).
Note that the latter two options involve reading the delta only to have to read the full content anyway if the delta is for the new format.