Size: 3310
Comment:
|
Size: 3312
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 47: | Line 47: |
||'''Next 2899 revisions'''||701755||435228 (-38% vs v1)||421508 (-40% vs v1, -3% vs no stem compression)|| ||'''First 2900 revisions'''||1471062||1109848 (-25% vs v1)||1056405 (-28% vs v1, -5% vs no stem compression)|| |
||'''Next 4999 revisions'''||1583141||1000899 (-37% vs v1)||974380 (-38% vs v1, -3% vs no stem compression)|| ||'''First 5000 revisions'''||2352448||1675519 (-29% vs v1)||1609277 (-32% vs v1, -4% vs no stem compression)|| |
Manifest v2 Plan
Use of the tree manifests described in ManifestShardingPlan will result in a new manifest hash, so now is good time to introduce a new manifest format. Much of the below comes from ImprovingManifestCompressionPlan.
Format
Header
To identify the manifest as being version 2, the first line will start with a null byte (an empty path, which is disallowed in v1). What follows it is a header:
\0{metadata}\n
The metadata field stores key/value pairs, with each pair separated by a null byte. The value is separated from the key by a colon (:).
For example:
\0treemanifest:\n
File entries
The current format is:
{path}\0{40-byte hex nodeid}{flags}\n
The proposal is to change it to:
{path}\0{flags[\0{metadata}]}\n{20-byte binary nodeid}\n
The hash is binary, saving 20 bytes per line, and is on a separate line to make deltas smaller.
If the flags field ends with a null byte, what follows it (until end of line) is metadata. The format is the same as the header metadata. There are no current plans for what will be stored in this field.
With stem compression, we would simply replace the first byte of the path with the number of bytes to copy from the previous path.
Space savings
Storage sizes (post gzip compression) in bytes when run on the first revisions of the Mozilla repo:
|
v1 |
v2 without stem compression |
v2 with stem compression |
Full revision (rev 1 of the repo) |
769307 |
674620 (-12% vs v1) |
634897 (-17% vs v1, -6% vs no stem compression) |
Next 4999 revisions |
1583141 |
1000899 (-37% vs v1) |
974380 (-38% vs v1, -3% vs no stem compression) |
First 5000 revisions |
2352448 |
1675519 (-29% vs v1) |
1609277 (-32% vs v1, -4% vs no stem compression) |
Backwards compatibility
When the first manifest is written using the new format, we'll add an entry to requires.
TBD: Will we convert to old format on the fly for exchange?
Readdelta
In a few places, we use manifest deltas without resolving the entire manifest. One example is hg verify, which slows down a lot when not taking advantage of reading deltas (a naive test shows >5x on the Mozilla repo). Since the new format splits up file entries on two lines, deltas for modifications will not include the file path, which means it will not be useful. Therefore, we will not be reading delta for v2 manifests. However, the problem is that it's not obvious whetera read delta is in the old or the new format. There seems to be a few options here:
Never read deltas when using the new format (as indicated by the requires entry) and accept that some operations will be slower.
- Look closer at the delta content. It seems like a delta of the old format could not be mistaken for a delta of the new format, or vice versa.
- Modify the format so it's trivial to determine whether it's a v1 or v2 manifest by reading a delta. This can be done by prepending every line by a null byte (empty path).
Note that the latter two options involve reading the delta only to have to read the full content anyway if the delta is for the new format.