1079
Comment:
|
4808
fix formatting
|
Deletions are marked like this. | Additions are marked like this. |
Line 1: | Line 1: |
<<Include(A:style)>> | = Shared Repositories = |
Line 3: | Line 3: |
<<Include(A:stub)>> | == Current Situation == |
Line 5: | Line 5: |
= Shared repositories = | Shared repositories are currently implemented using the ShareExtension. |
Line 7: | Line 7: |
It would be nice if we could share data files among multiple repositories. One way to do this would be to keep index files in the repository but have them point to a pooled data file repository. | Many other extensions do not work well with the `share` extension. A big part of this problem is that there is no easy way to designate whether any given file should be shared. Furthermore, even in core, files that could/should be shared are intermixed with files that should not be shared. For example: the bookmarks file (`.hg/bookmarks`) and the active bookmark file (`.hg/bookmarks.current`) are in the same directory, but the active bookmark file applies to the working copy state while the bookmarks file is much more about repository state (and indeed, is updated on pull, for example). This problem shows up again and again with a large number of extensions, inside and outside of core, as well as features within core itself including: `shelve`, history rewriting operations (eg `strip`, `rebase`, `histedit` with the `strip-backups` directory), `remotenames`, `hgsubversion`, etc, etc. |
Line 9: | Line 9: |
1. give revlog two opener functions, one for indices and one for data 2. the data opener opens data files and a master (shared) index file. It is used during write operations. Its linkrev field is meaningless. 3. the index opener points to the local subset of the master index. It is updated with the correct linkrev on append. |
Note that cache files also fall into this problem: many cache files can and should be shared between shared repos, but today none are. Only a handful of caches depend on the working copy parent (eg, visibility of obsolete changesets). |
Line 13: | Line 11: |
== disadvantages == | Currently, there are three vfs types: |
Line 15: | Line 13: |
* two indices need to be updated for every write. The data is small, but the seeks may not be * likewise, there may be seeking between the index and the data files when they aren't close together. Hopefully the index is cached when the revlog is first opened |
* wvfs (for writing to the working copy, unshared) * svfs (for writing to the /.hg/store/ directory, shared) * vfs (for writing to the /.hg/ directory, unshared) |
Line 18: | Line 17: |
== dangers == | `atomictmp` is a parameter to vfs for making a single file atomically written, but it does not make any guarantees about multiple files (that is handled today, rather imperfectly, by transactions) |
Line 20: | Line 19: |
* strip would have to be careful not to remove data belonging to other repositories. * inotify might be confused if store path changes after inserve has started? |
== Problem Statement == In order to solve this problem, we need a generic, safe, and easy way for both core and extensions to designate a file as "shareable". Essentially, there are three categories of shared state: 1. should not be shared (eg, `dirstate`) 2. may be shared optionally (eg, `bookmarks`) 3. always shared (eg, `store/*`) Atomic transactions also provide a layer of complexity (see AtomicRepositoryLayoutPlan). Some files need to be updated together, atomically. Shared files that need to be updated atomically include: `bookmarks` when a commit is made; `remotenames` during a pull, etc. The plan (based on a chat with Pierre-Yves) is to have a file that points to a directory name (or multiple directory names) where the current versions of files are. This file will then be updated atomically (via a tempfile rename). This atomicity will need to work with both shared and unshared files. To be totally correct, shared files will need to be updated in transactions that include unshared files (eg, a pull that updates both local unshared bookmarks and shared remote names). == Locking == Write operations on shared repositories are fundamentally multi-repository operations, so a locking scheme must be made. Today, we lock wlock (which protects /.hg/* except /.hg/store/*) and then lock (which protects /.hg/store/*) (see LockingDesign). Operations on shared repositories will need to take locks on the first repositories, so locking order is important. Since we acquire wlock first, it makes since to next acquire the shared file lock and then the storage lock. == Solution proposals == === Solution A === Add a vfs for files that can be shared. Files opened with this vfs will be shared by default. Add a mechanism for excluding files from sharing by configuration (eg, for when bookmarks are not shared). Over time, we can work on migrating use cases of vfs to this new shared vfs (unfortunately, svfs is already in use; it will need another name). When this transition is done, we will need to make sure to acquire the shared lock at the same time. Most extensions will need to be modified to use this new vfs. pros: safe, easy first step (just new APIs); gradual change cons: many more places to update, long tail of work that, realistically, never completes === Solution B === Make the standard vfs share files by default. Make lock acquire both wdir locks by default (and introduce new more granular locks for the future). Opt out files that should not be shared (eg, dirstate, active bookmark, etc). pros: fixes most share problems immediately; can potentially be implemented by an extension cons: big backwards compatibility break; unlikely it can be part of the share extension = Dangers of shared repositories = * strip would have to be careful not to remove data belonging to other repositories (it is currently not careful; see https://selenic.com/hg/file/648323f41a89/hgext/share.py#l32) ** Changeset Evolution and its derivatives (such as the `inhibit` and `directaccess` extensions) solve this problem by not removing obsolete commits |
Shared Repositories
Current Situation
Shared repositories are currently implemented using the ShareExtension.
Many other extensions do not work well with the share extension. A big part of this problem is that there is no easy way to designate whether any given file should be shared. Furthermore, even in core, files that could/should be shared are intermixed with files that should not be shared. For example: the bookmarks file (.hg/bookmarks) and the active bookmark file (.hg/bookmarks.current) are in the same directory, but the active bookmark file applies to the working copy state while the bookmarks file is much more about repository state (and indeed, is updated on pull, for example). This problem shows up again and again with a large number of extensions, inside and outside of core, as well as features within core itself including: shelve, history rewriting operations (eg strip, rebase, histedit with the strip-backups directory), remotenames, hgsubversion, etc, etc.
Note that cache files also fall into this problem: many cache files can and should be shared between shared repos, but today none are. Only a handful of caches depend on the working copy parent (eg, visibility of obsolete changesets).
Currently, there are three vfs types:
- wvfs (for writing to the working copy, unshared)
- svfs (for writing to the /.hg/store/ directory, shared)
- vfs (for writing to the /.hg/ directory, unshared)
atomictmp is a parameter to vfs for making a single file atomically written, but it does not make any guarantees about multiple files (that is handled today, rather imperfectly, by transactions)
Problem Statement
In order to solve this problem, we need a generic, safe, and easy way for both core and extensions to designate a file as "shareable". Essentially, there are three categories of shared state:
should not be shared (eg, dirstate)
may be shared optionally (eg, bookmarks)
always shared (eg, store/*)
Atomic transactions also provide a layer of complexity (see AtomicRepositoryLayoutPlan). Some files need to be updated together, atomically. Shared files that need to be updated atomically include: bookmarks when a commit is made; remotenames during a pull, etc. The plan (based on a chat with Pierre-Yves) is to have a file that points to a directory name (or multiple directory names) where the current versions of files are. This file will then be updated atomically (via a tempfile rename). This atomicity will need to work with both shared and unshared files. To be totally correct, shared files will need to be updated in transactions that include unshared files (eg, a pull that updates both local unshared bookmarks and shared remote names).
Locking
Write operations on shared repositories are fundamentally multi-repository operations, so a locking scheme must be made. Today, we lock wlock (which protects /.hg/* except /.hg/store/*) and then lock (which protects /.hg/store/*) (see LockingDesign). Operations on shared repositories will need to take locks on the first repositories, so locking order is important. Since we acquire wlock first, it makes since to next acquire the shared file lock and then the storage lock.
Solution proposals
Solution A
Add a vfs for files that can be shared. Files opened with this vfs will be shared by default. Add a mechanism for excluding files from sharing by configuration (eg, for when bookmarks are not shared).
Over time, we can work on migrating use cases of vfs to this new shared vfs (unfortunately, svfs is already in use; it will need another name). When this transition is done, we will need to make sure to acquire the shared lock at the same time. Most extensions will need to be modified to use this new vfs.
pros: safe, easy first step (just new APIs); gradual change
cons: many more places to update, long tail of work that, realistically, never completes
Solution B
Make the standard vfs share files by default. Make lock acquire both wdir locks by default (and introduce new more granular locks for the future). Opt out files that should not be shared (eg, dirstate, active bookmark, etc).
pros: fixes most share problems immediately; can potentially be implemented by an extension
cons: big backwards compatibility break; unlikely it can be part of the share extension
Dangers of shared repositories
strip would have to be careful not to remove data belonging to other repositories (it is currently not careful; see https://selenic.com/hg/file/648323f41a89/hgext/share.py#l32)
* Changeset Evolution and its derivatives (such as the inhibit and directaccess extensions) solve this problem by not removing obsolete commits