This document represents a proposal only.
See also AtomicRepositoryLayoutPlan and ReadLockPlan for related topics.
Deficiencies in Existing Store Format
Mercurial's existing store model is based on separate file(s) per logical tracked path. If you have 100,000 files under version control, there will be at least 100,000 files (plus directory entries) in the .hg/store directory. The unbound growth of number of files in the .hg directory poses a number of problems:
higher probability of stat cache eviction -> lowered performance
various filesystems don't efficiently create thousands of files (most notably NTFS - see issue4889)
- various filesystems don't efficiently delete thousands of files
- In server environments where dozens, hundreds, or even thousands of repos are stored, extreme inode count can be problematic
It's worth explicitly calling out performance problems on Windows. On that platform, unbundling mozilla-central is several minutes slower than on other platforms because closing file handles that have been appended to is slow on NTFS. This can be mitigated by doing I/O on separate threads, but the underlying problem is still there.
There are also related problems from the separate file per logical track path storage model and the (append-only) store format of today:
- redundancy in storage for copied/moved files
- causes store size to bloat if there are excessive copies or moves
- may restrict people from performing copies/moves because of scaling concerns
- breaks an important rule of developer tools which is that their deficiences shouldn't prevent people from doing what they want to do
obtaining a consistent view/snapshot of the store (for e.g. stream bundle generation) requires walking and stat()ing the entire store while a write lock is obtained. More files -> longer lock
- this is related to read locks
- this is related to how the changelog files are updated last as part of the transaction
- data from obsolete/hidden changesets lingers forever
- wastes space
- provides little to no value (value reaches 0 over time)
- adds overhead because repository filter needs to track more and more omitted revisions
- requires linkrev adjustment, which adds overhead
- transaction close overhead is proportional to number of files being modified
The goal of this plan is to establish a new repository store format that relies on fewer files and doesn't suffer from some of the inefficiencies and scaling concerns of the existing model.
Advantages of Existing Store Format
Before we design a new store format, it's worth calling out what the existing store model gets right:
- Constant time lookup
- We know which file(s) data is in because the tracked path under version control maps to an explicit filename in the store. There is little to no cost to finding where data is located.
- Cost today is resolve file name (essentially free), stat for presence of data vs index file (cheapish), read index (moderate), seek (cheap), read (moderate)
- Bound lookup time has performance implications for nearly every repository operation. This is especially true for operations that read the entire repository, such as clones or heavy pulls.
- Data from the changelog and manifests is generally more important to performance than files. But files are important for operations like diff generation and bundling (which is required server side for servicing pulls).
- Git's storage model (loose object files + multiple packfiles) has scaling problems because data for a specific object could be in an unbound number of locations. As the number of storage locations grows, performance worsens. Periodic repacks consolidate the storage into packfiles or even a single packfile. But on large repositories, repacks are time and resource prohibitive.They can be extremely painful for server operators especially.
- History for tracked paths are independent (related to above point about lookup time)
- To load history for a particular tracked path doesn't require loading data for unrelated tracked paths
Contrast with Git's generic key-value based storage model and complicated pack heuristics where data for a tracked path can be scattered all over the place
- Append only files and sequential I/O
- Append only writes are good for performance
- Sequential reads are good for performance (especially on magnetic disks) and are good for caching
- Append only writes make transaction rollback simpler (just truncate the file)
- Append only writes make read locks easier (establish read barrier that you can't read past)
- Append only files make incremental backups much easier since backup time/cost is proportional to new repository data since last backup
- No garbage collection
- Scaling garbage collection is hard and resource intensive. See Git.
Desirable Properties
Now that we've outlined the advantages and disadvantages of the existing store format, let's piece together a list of desirable properties for the new store format:
- Requires as few files as possible and/or allows for compacting the store into fewer files
- Bound lookup time for finding data
- Only store data for copied and moved files once
- Consistent read-only views/snapshots of repositories can be obtained efficiently
- Obsoleting/hiding changesets doesn't introduce unbound overhead
- Transaction rollback is as efficient as possible
- Append-only files and sequential I/O is used heavily
- Per-file log/blame is fast
- No garbage collection operations randomly performed during commands
Proposed Store Format
Before we dive into the new proposed store, it is important to confront some realities.
First, obsolete/hidden changesets will grow unbounded and add unbounded overhead unless there is some form of garbage collection. There is overhead for reading the changelog and performing linkrev adjustment. Solving garbage collection almost certainly means reading all data and rewriting it out again. For very large repositories, these operations will require gigabytes of I/O and likely a few minutes of wall time to perform (roughly equivalent to producing a bundle of the entire repo). These can certainly be performed out-of-band and periodically. But performance will degrade over time until the effects of obsolete/hidden changesets are rectified.
Second, preventing storage redundancy for file copies and moves requires reading from multiple store paths / revlogs.
Third, we have to pick from reducing total file count, bounded data lookup times, and simple/append-only storage/indexing. Let's elobarate.
Our smallest fundamental unit of storage is a delta. It is applied against a parent, which is applied against its parent, which is applied against its parent, etc to obtain the fulltext of something. Deltas are grouped by store path and collected into revlogs. To facilitate rapid lookup of specific deltas and rapid revision resolution, revlogs have an index containing offsets to individual deltas. An index into deltas must exist for performance reasons and is non-negotiable. The existence of a revlog as a collection of deltas for a single logical entity should almost certainly exist because it provides compelling advantages (notably performance) over a more generic key-value store (such as Git's model). You should at least agree that space proximity for a store path is important (data for a particular store path can be written/read using mostly sequential I/O).
Unbound number of filesystem files is bad and preventing this is a major goal of the new store format. This necessitates writing data for multiple store paths (revlogs) into the same filesystem file.
Let's imagine a new entity called a "packed revlog." It is a single file holding the data of multiple store paths / revlogs. There has to be an index containing the offsets of individual entries inside the "packed revlog" so readers can find data without full scans. Just like indexes into deltas in revlogs are necessary for performance reasons, so are these indexes. For explatory purposes, let's assume "packed revlogs" may or may not be immutable (but always append only) and that the index can exist in a separate location or at the beginning of the file.
With this new "packed revlog" concept in mind, let's examine potential store formats and their properties:
- The existing store format
- Append-only files (good)
- Unbound file count (bad) (plus the other deficiencies above)
- A store format based on creating multiple immutable "packed revlog" files. Each "packed revlog" file has its own (also immutable) index. A new "packed revlog" is produced for every store write.
- Append-only files (good)
- Much fewer files than separate file per store path. Still technically unbound, but the total file count count is likely 1-2 magnitudes lower and thus manageable. (good)
- Unbound lookup time because you have to read N indexes to find data for a store path (bad)
- This is Git's problem with regards to multiple packfiles and store alternates
- A store format based on creating multiple immutable "packed revlog" files but with a single, mutable index
- (Variation on #2)
- Append-only files for delta/revlog data (good)
- File count remains in check (good)
- Lookup time remains in check because there is a single, magical index (good)
- Multiple files may need to be opened to read data for a particular store path (not great but likely tolerable)
Requires a magical unified index. This is a difficult problem because building indexes with good performance and consistent views that respect transactions is hard. We'd likely be reinventing the internals of <insert popular database>. (bad)
Earlier we said that we had to choose from reducing total file count (#2 and #3), bounded data lookup times (#1 and #3), and simple/append-only storage/indexing (#1 and #2). No pure solution achieves everything.