Problem Statement
Cloning and pulling (large) repositories can consume a large amount of CPU on servers. In the face of high client volume, this can lead to resource exhaustion and service unavailability.
These operations can consume large amounts of CPU because every clone or pull that transfers changeset data results in the server creating a changegroup bundle of the data to be transferred. This operation is expensive because the producer has to read revlogs and construct new delta chains from the content. It is essentially re-encoding the revlog on the fly. For revlogs with large entries (such as manifests with 100,000 files) or large diffs, this can take a lot of CPU (and even I/O).
Solution: Pre-Generated Bundles
The inherent problem is servers are "rebundling" repository data for every clone or pull operation. What if instead of the server generating bundles at request time, it could pre-generate the bundles and save them somewhere. When the client connects, it could obtain the contents of that bundle, apply it, then pull the changes since the bundle was created.
This solution works because repository data is generally append-only and immutable. This means that clones and subsequent pulls can effectively be modeled as replays of a linear log of data. Data is strictly additive, so bundling a snapshot of the repository and then transferring the delta since that bundle is effectively equivalent to hg unbundle + hg pull.
This solution saves a significant amount of CPU on the server because reading a static file off disk (or redirecting elsewhere) is almost certainly much cheaper than rebundling.
Methods of Serving Pre-Generated Bundles
Inline bundle2 Part
In this solution, when a clone or pull is requested, the server takes inventory of what bundles are available. If an appropriate one is present, it reads its data and inserts it directly into the bundle2 reply. This is simply streaming bits off disk or elsewhere. Then, the server calculates what changesets aren't in the bundle and constructs a new bundle2 part containing those. From the client's perspective, it (likely) receives multiple changegroup bundles.
Pros:
- Data goes directly from server to client
- Little CPU used on server
- Server is flexible about how and where it stores bundles
Cons / Oddities:
- Server must store what is in each pre-generated bundle so it knows what additional changesets to add to the response. This is effectively reimplementing discovery and is somewhat more complicated to implement. It could be made simpler by introducing bundles that efficiently encode the heads (and bases) therein.
- Server incurs lots of bandwidth to transfer files, leading to long-running TCP connections (no different than before)
- Streaming of pre-generated bundles kind of turns Mercurial into a FTP server of sorts
External Bundle Download
In this solution, instead of the server sending the pre-generated bundle data inline with the bundle2 reply, it instead advertises a URL (and likely metadata) of a bundle to fetch. The client sees the URL, fetches and applies it, then gets the incremental data from the server.
There are a few variants of this, which will be explained shortly. However, there are some common concerns with this approach:
- Security concerns with remote-hosted content
- Firewall issues
- Content negotiation (URLs in multiple data centers, bundle type preference, etc)
- More TCP sockets could lead to higher latency
Inline Followup Variant
The server sends the bundle URL in bundle2 part and then sends a changegroup part with data since the bundle was generated.
Support for this exists in Mercurial today with the remote-changegroup bundle2 part. Server-side code for generating these parts and subsequent changegroup parts is not implemented in core.
Pros:
- Single exchange of data between server and client
Cons:
- Same complexity concerns as inline data transfer. Notable, the server must have knowledge of exactly what is in the bundle so it may construct an appropriate follow-up changegroup bundle.
- If client is synchronously fetching and applying the remote bundle, this could result in an idle TCP socket. Or, if we start using background threads for network I/O, it would result in the buffering of large follow-up data.
Disconnect and Return Variant
Server sends URL. Client detaches, fetches and applies bundle. Then, the client reconnects to the server and does the equivalent of an hg pull (if necessary).
This could be implemented in a few different ways:
- Client issues getbundle with capabilities saying it can apply remote hosted bundles. Bundle URL part received. Client disconnects. Applies bundle. Starts over.
- Server advertises that it hosts bundles. Client requests a bundle, disconnects, applies bundle, and then reconnects for the pull.
These are very similar. But in #1 the bundles are integrated into the "getbundle" wire protocol command. In #2, there is likely a separate server command or "listkeys" namespace advertising bundles which the client can connect directly to for bundle info.
Pros:
No additional complexity for discovery-like operations to generate follow-up bundle: client does an hg pull equivalent and existing discovery semantics apply. Although, the server could have additional complexity to inform the client that a follow-up request isn't necessary if the bundle is complete.
Cons:
- Additional client-server TCP connection
- Potential for redundant discovery and cost associated with that
Clone Bundle Proposal
Serving bundles with partial repository content is more complicated than serving a snapshot of an entire repository. So, the initial proposal for serving from static, pre-generated bundles will focus on bootstrapping clones (not subsequent pulls).
Mozilla has implemented support for static bundle serving using this strategy and they typically serve >1TB/day using this model, saving hundreds of hours of CPU time on servers. It is implemented as a Mercurial extension - called bundleclone - that is installed on both the client and server. The proposal that follows is inspired by and very similar to Mozilla's solution.
The server advertises a "clonebundles" capability indicating that it has the potential to serve snapshots of entire repository data suitable for bootstrapping clones.
When clients call the "clonebundles" wire protocol command, they receive a manifest of available "clone bundles." Each entry contains a URL and optional key-value metadata. Manifests look something like this:
https://hg.cdn.mozilla.net/mozilla-central/d6ea652c579992daa9041cc9718bb7c6abefbc91.gzip.hg REQUIRESNI=true TYPE=HG10GZ https://hg.cdn.mozilla.net/mozilla-central/d6ea652c579992daa9041cc9718bb7c6abefbc91.bzip2.hg REQUIRESNI=true TYPE=HG10BZ https://s3-us-west-2.amazonaws.com/moz-hg-bundles-us-west-2/mozilla-central/d6ea652c579992daa9041cc9718bb7c6abefbc91.gzip.hg TYPE=HG10GZ ec2region=us-west-2 https://s3-external-1.amazonaws.com/moz-hg-bundles-us-east-1/mozilla-central/d6ea652c579992daa9041cc9718bb7c6abefbc91.gzip.hg TYPE=HG10GZ ec2region=us-east-1
Metadata with UPPERCASE keys is reserved for Mercurial's usage. These will describe officially supported attributes, such as bundle type, compression, required bundle2 part support, etc. The above example includes the use of REQUIRESNI, which tells clients that Server Name Indication (SNI) is required (this isn't supported on Python < 2.7.9).
Lowercase keys can be used by site deployments for custom operation. In the above example, our site operator has indicated which EC2 region a file is hosted in.
Cloning from pre-generated bundles is transparently supported as part of hg clone operations (actually part of the generic exchange.pull() code path). If a server is advertising bundles, the client will fetch the manifest automatically and choose an appropriate bundle to seed the clone from. If a bundle is available, it will fetch the URL, apply the content, then perform an incremental pull to retrieve content missing from the bundle (bookmarks, phases, etc in the case of non-bundle2) and data produced since the bundle was created (new pushes may not yet be in the advertised bundles).
The aforementioned key-value metadata is not only used by filtering of compatible entries by the client, but it is also used as a primitive form of content negotiation. Clients can express preferences for which attributes and values to prefer over others. For example, a client that knows it is on a fast network could request a "streaming clone" bundle instead of a gzipped one, trading more network utilization for lower CPU (and presumably yielding a faster clone in the process).
The default implementation of the server-side extension will literally be serving the manifest from a file on disk. However, more advanced implementations could do something more dynamic. In our example, clients were the ones performing content negotiation. However, there is nothing stopping a server operator from dynamically emitting a manifest based on the client. For example, IP detection could be used to advertise the URL in closest physical proximity to the client.
Non-Experimental Blockers
clonebundles is currently marked as experimental and needs to be enabled in order for clients to use it. The following need to be addressed before it is enabled by default.
- Kill switch on client to disable feature
- If the server is misconfigured and clonebundles are failing, the client needs an easy way to disable the feature so it can clone.
- Hooks run twice
Since we're effectively performing 2 hg pull, the post incoming hooks fire twice during clone.
- [indygreg] I'm inclined to solve this problem later as part of a larger refactor to implement more robust / incremental / checkpointed cloning
- Transaction rollback issues
- A clone abort might result in a partial rollback and incomplete clone due to 2 apply operations being performed under the hood
- [indygreg] I'm inclined to punt this and solve with the aforementioned incremental clone feature
- Better tools and documentation for server operators to configure the feature
- This is likely not a hard blocker