Bigfiles Extension
This extension is not distributed with Mercurial.
Author: Andrei Vermel
Repository: http://bitbucket.org/avermel/bigfiles/
Overview
Support versions of big files with storage outside hg repo.
This is useful for several reasons.
- Due to memory and performance limitations big files shouldn't be stored in a hg repo. Hg warns when files bigger than 10Mb get checked. It runs out of memory checking in a 170Mb file on my 2Gb box. This extension allows to keep them versioned without checking in.
- Often you find out that some big files are about to get checked in only after 'hg add' complains and suggests to revert them. It may be tedious to do so by hand. This extension takes care of it.
- Keeping multiple versions of big files in a hg repo may cause it to grow significantly. This takes space in local clones, while history of big files contents is fairly useless - usually they are not to be diffed. With this extension it's possible to only fetch the necessary files from the upstream repo.
- Hg is rather slow to pull a big update over the network, and would start download from scratch if transfer is stopped. Since a repo of big files is just a directory tree with versions of big files, it can be easily rsync-ed with the local repo.
Implementation
Big files are not put to hg repo. They are listed in a file called '.bigfiles', which also serves as an ignore file similar to .hgignore, so they do not clutter output of hg commands. The file also stores check sums of the big files in a form of comments. File '.bigfiles' is versioned by hg, so each changeset knows which big files it uses from the names and checksums. The file can be diffed and merged, which is nice.
The versions of big files are stored in a versions directory, with checksums attached to filenames.
Usage
- The extension overrides 'hg update', so that it can compare contents of '.bigfiles' before and after the update to remove and fetch appropriate big files.
- The directory storing versions of big files can be synced with the remote one (the extension doesn't do this, but tells the list of the necessary files). The versions corresponding to old changesets can be removed to save space.
- To add a new big file, use normal 'hg add', ignoring the size warning.
- To remove a tracked big file, just delete it.
- 'hg bstatus' - to examine state of big files in working directory.
- 'hg brefresh' - to refresh '.bigfiles' and versions directory with added,
- removed and modified big files.
- 'hg bupdate' - to fetch files from versions directory as recorded in
- '.bigfiles', and get a list of necessary files missing in the version directory.
Configuration
Configure your .hgrc to enable the extension by adding following lines:
[extensions] bigfiles = path/to/bigfiles.py [bigfiles] repo = path/to/versions/dir
Sample Usage Session
# Checking in a big file to a new repo: C:\tmp>ls -sh total 9.8M 9.8M windows-essential-20071007.zip C:\tmp>hg init C:\tmp>hg stat ? windows-essential-20071007.zip C:\tmp>hg add adding windows-essential-20071007.zip windows-essential-20071007.zip: files over 10MB may cause memory and performance problems (use 'hg revert windows-essential-20071007.zip' to unadd the file) # The warning from hg means that the file can be controlled by 'bigfiles' # Get a reminder that we need to specify which dir to use to store the big files. C:\tmp>hg bstat abort: bigfiles.repo path not configured # Let's create a repo for the big files in a new dir somewhere. C:\tmp>mkdir ..\tmp_bigrepo C:\tmp>echo [bigfiles] > .hg\hgrc C:\tmp>echo repo =c:/tmp_bigrepo >> .hg\hgrc C:\tmp>cat .hg\hgrc [bigfiles] repo =c:/tmp_bigrepo # Bstat shows that a big file is about to be added. C:\tmp>hg bstat A windows-essential-20071007.zip # Put it under control of 'bigfiles' C:\tmp>hg bref forgetting windows-essential-20071007.zip # now a file .bigfiles is created C:\tmp>ls .bigfiles .bigfiles C:\tmp>cat .bigfiles syntax: glob windows-essential-20071007.zip#fc8fb93abeb53fe301594fe6463c0ac2436c59f8 # We want to keep '.bigfiles' revisioned by hg C:\tmp>hg add adding .bigfiles # Complete checking in of the big file - hg only stores '.bigfile', not # the windows-essential-20071007.zip C:\tmp>hg ci -m "Added a big file" # The big file is actually versioned in the big files repo C:\tmp>ls ../tmp_bigrepo windows-essential-20071007.zip.fc8fb93abeb53fe301594fe6463c0ac2436c59f8 # The big file gets some modification C:\tmp>echo "qqq" >> windows-essential-20071007.zip C:\tmp>hg bstat M windows-essential-20071007.zip # Put modified file under control of 'bigfiles' C:\tmp>hg bref C:\tmp>hg stat M .bigfiles C:\tmp>hg diff diff --git a/.bigfiles b/.bigfiles --- a/.bigfiles +++ b/.bigfiles @@ -1,3 +1,3 @@ syntax: glob -windows-essential-20071007.zip#fc8fb93abeb53fe301594fe6463c0ac2436c59f8 +windows-essential-20071007.zip#4187f81bc4fe5fe6c9586a8481cec4179ac63aa0 C:\tmp>hg ci -m "Modified a big file" C:\tmp>ls ../tmp_bigrepo windows-essential-20071007.zip.4187f81bc4fe5fe6c9586a8481cec4179ac63aa0 windows-essential-20071007.zip.fc8fb93abeb53fe301594fe6463c0ac2436c59f8 C:\tmp>hg log changeset: 1:4e312e37f18c tag: tip user: Andrei Vermel <avermel@mail.ru> date: Thu Sep 24 23:49:23 2009 +0400 summary: Modified a big file changeset: 0:f6eafba99057 user: Andrei Vermel <avermel@mail.ru> date: Thu Sep 24 23:47:37 2009 +0400 summary: Added a big file # Check out the previous revision - the big file gets fetched from the repo C:\tmp>hg co -r 0 1 files updated, 0 files merged, 0 files removed, 0 files unresolved fetching windows-essential-20071007.zip.fc8fb93abeb53fe301594fe6463c0ac2436c59f8
You may also have interest in LargefilesExtension and SnapExtension.