Size: 2212
Comment: Add section on many commits
|
Size: 4967
Comment: concern: large files
|
Deletions are marked like this. | Additions are marked like this. |
Line 22: | Line 22: |
* Install the[[https://bitbucket.org/facebook/remotefilelog|remotefilelog]] extension to help reduce clone and pull times. * Install a modern Mercurial version. Scaling issues are always being addressed and running the newest stable release is a good bet to have the best performance. |
* Use the[[https://bitbucket.org/facebook/remotefilelog|remotefilelog]] extension on the server and client to help reduce clone and pull times. * Use a modern Mercurial version. Scaling issues are always being addressed and running the newest stable release is a good bet to have the best performance. |
Line 25: | Line 25: |
Summary: Extensions allow scaling Mercurial with minimal maintenance overhead: [[https://bitbucket.org/facebook/hgwatchman|hgwatchman]] makes ''hg status'' 5x faster and [[https://bitbucket.org/facebook/remotefilelog|remotefilelog]] speeds up pull, push and clone by factor 10 and integrates them with the local memcached server. | == Concern: Many files == Repositories with tens or hundreds of thousands of files pose different scaling challenges compared to repositories with tens, hundreds, or even thousands of files. === Impact === * Clone and pull times will increase (more data to transfer) * Filesystem has to manage more files (because Mercurial uses a separate file for each file under version control) * ''hg status'' takes longer (more files to check) * Repository operations become filesystem and I/O bound * Manifest revision resolution becomes slow (impacts some operations that needs to iterate over the sets of files in each commit) === When to expect problems === Mercurial should scale to tens of thousands of files without any modification (provided your system has a decent filesystem and I/O performance). Mozilla's [[https://hg.mozilla.org/mozilla-central/|mozilla-central]] repository has close to 94,000 files as of April 2014 and is growing steadily. === Solutions === * Use the[[https://bitbucket.org/facebook/remotefilelog|remotefilelog]] extension on the server and client to help reduce clone and pull times. * Use the [[https://bitbucket.org/facebook/hgwatchman|hgwatchman]] extension on clients to make ''hg status'' faster. == Concern: Large files == Repositories with large files (measured in the megabytes, tens of megabytes, or even hundreds of megabytes) may pose scaling challenges. With a standard Mercurial configuration, the entire content of the repository is cloned to all clients. That means if you check in a incompressible 100 MB binary file, each client will need to transfer that 100 MB file on every clone. If you check in a completely new version of that file that varies completely from the previous version (read: delta compression won't work), clients will need to pull the original version '''and''' the new version - pulling a total of 200 MB. === Impact === * Repository size on disk can grow significantly * Clone and pull times take longer (more data to transfer) * ''hg update'' takes longer (due to managing more bytes) * I/O becomes more of a bottleneck === When to expect problems === This depends on your environment. If all the clients of a repository are on a fast ethernet (100 mbps or faster) and have ample and fast storage, the impact of large files will not be felt as much as they would if you are trying to support clients on dial-up Internet connections. But one thing is certain: old versions of binary files are arguably unnecessary and wasteful. === Solutions === * Use the [[LargefilesExtension|LargeFiles]] extension on the server and client to minimize data transfer during clones and pulls. * Use the[[https://bitbucket.org/facebook/remotefilelog|remotefilelog]] extension on the server and client to help reduce clone and pull times. * Move all clients to faster networks (if possible) * Consider not storing large files in Mercurial (the ultimate feature of last resort) For many consumers, the largefiles or remotefilelog extensions should suffice. |
Mercurial can scale from single-developer projects up to massive codebases and huge developer teams.
For an example of a large-scale deployment, you can check the recent writeup by Durham Goode from Facebook: Scaling Mercurial at Facebook.
Scaling is not a problem with a single root cause. Instead, there are various patterns that can lead to separate scaling issues.
Concern: Many commits
For active repositories, the number of commits/changesets over time approaches infinity. This poses some scaling problems.
A standard Mercurial install and clone maintain a full copy of the repository and all of its history. This is similar to how other distributed version control systems (like Git) work.
1. Impact
- Repositories take longer to clone and pull (because they have more data)
- Iterating over all commits takes longer (because there are more)
2. When to expect problems
Scaling due to number of commits alone likely won't be a significant issue by itself. Instead, you'll likely hit issues dealing with manifests or file data size first.
Mercurial repositories with a few hundred thousand commits exist. As of April 2014, Mozilla's mozilla-central repository is close to 200,000 commits with no apparent scaling problems due to commit volume alone. There are known to be private repositories at other companies that have over 100,000 additional commits and they don't have scaling problems.
3. Solutions
Use theremotefilelog extension on the server and client to help reduce clone and pull times.
- Use a modern Mercurial version. Scaling issues are always being addressed and running the newest stable release is a good bet to have the best performance.
Concern: Many files
Repositories with tens or hundreds of thousands of files pose different scaling challenges compared to repositories with tens, hundreds, or even thousands of files.
1. Impact
- Clone and pull times will increase (more data to transfer)
- Filesystem has to manage more files (because Mercurial uses a separate file for each file under version control)
hg status takes longer (more files to check)
- Repository operations become filesystem and I/O bound
- Manifest revision resolution becomes slow (impacts some operations that needs to iterate over the sets of files in each commit)
2. When to expect problems
Mercurial should scale to tens of thousands of files without any modification (provided your system has a decent filesystem and I/O performance).
Mozilla's mozilla-central repository has close to 94,000 files as of April 2014 and is growing steadily.
3. Solutions
Use theremotefilelog extension on the server and client to help reduce clone and pull times.
Use the hgwatchman extension on clients to make hg status faster.
Concern: Large files
Repositories with large files (measured in the megabytes, tens of megabytes, or even hundreds of megabytes) may pose scaling challenges.
With a standard Mercurial configuration, the entire content of the repository is cloned to all clients. That means if you check in a incompressible 100 MB binary file, each client will need to transfer that 100 MB file on every clone. If you check in a completely new version of that file that varies completely from the previous version (read: delta compression won't work), clients will need to pull the original version and the new version - pulling a total of 200 MB.
1. Impact
- Repository size on disk can grow significantly
- Clone and pull times take longer (more data to transfer)
hg update takes longer (due to managing more bytes)
- I/O becomes more of a bottleneck
2. When to expect problems
This depends on your environment. If all the clients of a repository are on a fast ethernet (100 mbps or faster) and have ample and fast storage, the impact of large files will not be felt as much as they would if you are trying to support clients on dial-up Internet connections.
But one thing is certain: old versions of binary files are arguably unnecessary and wasteful.
3. Solutions
Use the LargeFiles extension on the server and client to minimize data transfer during clones and pulls.
Use theremotefilelog extension on the server and client to help reduce clone and pull times.
- Move all clients to faster networks (if possible)
- Consider not storing large files in Mercurial (the ultimate feature of last resort)
For many consumers, the largefiles or remotefilelog extensions should suffice.