Diff for "RevlogNG"

Differences between revisions 5 and 21 (spanning 16 versions)

This page does not meet our wiki style guidelines. Please help improve this page by cleaning up its formatting.

Note:

This page is primarily intended for developers of Mercurial.

RevlogNG was introduced with Mercurial 0.9 (see also Revlog).

Deficiencies in original revlog format:

no uncompressed revision size stored
SHA1 hash is potentially too weak
compression context for deltas is often too small
offset range is limited to 4MB
some metadata is indicated by escaping in the data

The original index format was:

4 bytes: offset
4 bytes: compressed length
4 bytes: base revision
4 bytes: link revision
20 bytes: nodeid
20 bytes: parent 1 nodeid
20 bytes: parent 2 nodeid
76 bytes total

RevlogNG format:

6 bytes: offset -- This is how far into the data file we need to go to find the appropriate delta
2 bytes: flags
4 bytes: compressed length -- Once we are offset into the data file, this is how much we read to get the delta
4 bytes: uncompressed length -- This is just an optimization. It's the size of the file at this revision
4 bytes: base revision -- The last revision where the entire file is stored.
4 bytes: link revision -- Another optimization. Which revision is this? Which commit is this?
4 bytes: parent 1 revision -- Revision of parent 1 (e.g., 12, 122)
4 bytes: parent 2 revision -- Revision of parent 2
32 bytes: nodeid -- A unique identifier, also used in verification (hash of content + parent IDs)
64 bytes total

RevlogNG header:

As the offset of the first data chunk is always zero, the first 4 bytes (part of the offset) are used to indicate revlog version number and flags. all values are in big endian format.

RevlogNG also supports interleaving of index and data. This can greatly reduce storage overhead for smaller revlogs. In this format, the data chunk immediately follows its index entry. The position of the next index entry is calculated by adding the compressed length to the offset.

For how renames are stored see "Problems extracting renames", a reply by mpm posted on Feb 12, 2008, on the mercurial mailing list.

-  ⇤ ← Revision 5 as of 2007-03-20 10:22:24 → 
  Size: 1420
  Editor: dial-213-168-64-121
  Comment:
+   ← Revision 21 as of 2015-01-12 16:09:46 → ⇥
  Size: 2282
  Editor: nicolaslegland
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-RevlogNG was introduced with Mercurial 0.9.
+<<Include(A:style)>>
<<Include(A:dev)>>
-Line 3:
+Line 4:
-Deficiencies in original revlog format:
+RevlogNG was introduced with Mercurial 0.9 (see also [[Revlog]]).

'''Deficiencies in original revlog format:'''
-Line 11:
+Line 14:
-The original index format was:
+'''The original index format was:'''
-Line 20:
+Line 23:
- * '''72 bytes total'''
+ * '''76 bytes total'''
-Line 22:
+Line 25:
-RevlogNG format:
+'''RevlogNG format:'''
-Line 24:
+Line 27:
- * 6 bytes: offset (allows for 256TB of compressed history per file)
+ * 6 bytes: offset -- This is how far into the data file we need to go to find the appropriate delta
-Line 26:
+Line 29:
- * 4 bytes: compressed length
 * 4 bytes: uncompressed length
 * 4 bytes: base revision
 * 4 bytes: link revision
 * 4 bytes: parent 1 revision
 * 4 bytes: parent 2 revision
 * 32 bytes: nodeid
+ * 4 bytes: compressed length -- Once we are offset into the data file, this is how much we read to get the delta
 * 4 bytes: uncompressed length -- This is just an optimization. It's the size of the file at this revision
 * 4 bytes: base revision -- The last revision where the entire file is stored.
 * 4 bytes: link revision -- Another optimization. Which revision is this? Which commit is this?
 * 4 bytes: parent 1 revision -- Revision of parent 1 (e.g., 12, 122)
 * 4 bytes: parent 2 revision -- Revision of parent 2
 * 32 bytes: nodeid -- A unique identifier, also used in verification (hash of content + parent IDs)
-Line 35:
+Line 38:
-RevlogNG header:
+'''RevlogNG header:'''
-Line 39:
+Line 42:
-RevlogNG also supports interleaving of index and data. This can greatly reduce storage overhead for smaller revlogs. In this format, the data chunk immediately follows its index entry. The position of the next index entry is calculated by adding the compressed length to the offset.
+RevlogNG also supports interleaving of index and data. This can greatly reduce storage overhead for smaller revlogs. In this format, the data chunk immediately follows its index entry. The position of the next index entry is calculated by adding the compressed length to the offset.

For how renames are stored see "[[http://selenic.com/pipermail/mercurial/2008-February/017139.html|Problems extracting renames]]", a reply by [[mpm]] posted on Feb 12, 2008, on the [[http://selenic.com/pipermail/mercurial/|mercurial mailing list]].

See also: ParentDeltaPlan
-Line 42:
+Line 49:
-CategoryNewFeatures
+CategoryInternals

[[FrenchRevlogNG|Français]]