Working Copy Sync Plan
Contents
1. The Overall Problem
There should be a lightweight way to transparently sync working copies across repositories in the background. This document covers one part of a proposed implementation.
The initial implementation's goals are:
Enable instantaneous one-way sync of a Mercurial working directory from one machine (the client) to another (the server).
- Do not require explicit commits or pushes to make the sync happen.
- Assume that network flakiness is a rare event and will not generally happen. (This makes it unsuitable for many laptop-based use cases. This will likely be addressed in future iterations.)
Outside this document's scope are:
How to detect whether a working copy has changed (that will likely be handled by HgwatchmanExtension)
1.1. Why would you even want to sync working copies?
Imagine that you're working in a local repository and have local changes. At many organizations, to test your changes you would need to sync the contents over to a remote server to e.g. test by sending a small amount of production traffic.
The usual way to do that is to make an explicit commit and push, then have a hook on the server that updates the test repo to that commit.
hg commit && hg push test-server -r .
This sucks. Making a commit is a heavyweight operation that requires a mental context switch. It adds unnecessary friction to what should be a seamless process -- for a web application, if the server were running locally you would just need to save the file you're editing, run any build steps, then hit refresh in your browser.
The overall plan allows developers to get the same workflow as with a local server.
1.2. Why not use rsync/Unison/Dropbox/<insert favorite file sync tool here>?
For small repositories, this works great! Not really for larger ones, though. Consider the case where a large update happens from one public commit to another, which changes > 10,000 files. Any of the above tools will try to sync all changed files, even though the only information the remote needs to know is that the public commit you're on has changed -- at that point the server can update to that commit (including possibly fetching those changes over the network, with RemotefilelogExtension).
1.3. Why not just make real commits on disk and strip them when we're done?
That is not atomic -- if hg log is run while a commit is being made and pushed, it's going to result in weird temporary commits being visible. Even if that is taken care of with e.g. a pre-secret phase, the fact that the lock is taken by a background process is still going to be visible to other Mercurial processes.
1.4. What else are in-memory commits good for?
Operations like shelve become easier and cleaner to implement.
- In-memory commits can be sent over (via bundles, for example) to a persistent blob store. This enables live backups of working copy changes.
hg bundle -r 'wdir()' can now work.
2. The Plan
2.1. On the client
In Mercurial, introduce a class called memrepo that represents a localrepository with additional ephemeral commits. An ephemeral commit is a commit that is serialized entirely in memory. It does not have an on-disk representation. A memrepo will be backed by a readonlyvfs, preventing writes to disk entirely.
The repository needs to be persistent in memory -- a background CommandServer could be used for that.
- Interesting actions:
- When a set of files is modified, added or removed, we create a new commit in memory containing files changed in this set.
- On commit, the in-memory state is discarded and we recompute changed files.
On update and rebase, we wait for the update to finish, then discard in-memory state and recompute changed files.
Once a commit is created, it is pushed to the server with the standard hg push mechanism.
2.2. On the server
- The server will have a copy of the same Mercurial repository, kept constantly up to date.
- The server will also have an in-memory daemon containing the same Mercurial repository, and a working copy on disk.
- When a commit is pushed to the server, it will be applied in memory, and then the working copy will be updated to the pushed commit.
- The server or build can then read the working copy from disk.
2.3. Performance tradeoffs and optimizations
- For non-tree manifests, instead of constructing a new manifest every time a new ephemeral commit is created, we could store changed entries as part of the changelog. This is in effect a new manifest format, optimized for ephemeral commits.
- Assume that all commits that are public on the client are present on the server. This allows us to skip the push discovery step and directly push relevant non-public commits.
- Instead of discarding in-memory state on commit or update, can we keep some of that state around to use in the future?