Differences between revisions 5 and 6
Revision 5 as of 2010-09-25 11:53:01
Size: 6070
Comment:
Revision 6 as of 2011-11-01 00:07:51
Size: 6074
Comment:
Deletions are marked like this. Additions are marked like this.
Line 164: Line 164:
You may also have interest in SnapExtension and BfilesExtension. You may also have interest in LargefilesExtension and SnapExtension.

Bigfiles Extension

This extension is not distributed with Mercurial.

Author: Andrei Vermel

Download site: http://bitbucket.org/avermel/bigfiles/

Overview

Support versions of big files with storage outside hg repo.

This is useful for several reasons.

  • Due to memory and performance limitations big files shouldn't be stored in a hg repo. Hg warns when files bigger than 10Mb get checked. It runs out of memory checking in a 170Mb file on my 2Gb box. This extension allows to keep them versioned without checking in.
  • Often you find out that some big files are about to get checked in only after 'hg add' complains and suggests to revert them. It may be tedious to do so by hand. This extension takes care of it.
  • Keeping multiple versions of big files in a hg repo may cause it to grow significantly. This takes space in local clones, while history of big files contents is fairly useless - usually they are not to be diffed. With this extension it's possible to only fetch the necessary files from the upstream repo.
  • Hg is rather slow to pull a big update over the network, and would start download from scratch if transfer is stopped. Since a repo of big files is just a directory tree with versions of big files, it can be easily rsync-ed with the local repo.

Implementation

Big files are not put to hg repo. They are listed in a file called '.bigfiles', which also serves as an ignore file similar to .hgignore, so they do not clutter output of hg commands. The file also stores check sums of the big files in a form of comments. File '.bigfiles' is versioned by hg, so each changeset knows which big files it uses from the names and checksums. The file can be diffed and merged, which is nice.

The versions of big files are stored in a versions directory, with checksums attached to filenames.

Usage

  • The extension overrides 'hg update', so that it can compare contents of '.bigfiles' before and after the update to remove and fetch appropriate big files.
  • The directory storing versions of big files can be synced with the remote one (the extension doesn't do this, but tells the list of the necessary files). The versions corresponding to old changesets can be removed to save space.
  • To add a new big file, use normal 'hg add', ignoring the size warning.
  • To remove a tracked big file, just delete it.
  • 'hg bstatus' - to examine state of big files in working directory.
  • 'hg brefresh' - to refresh '.bigfiles' and versions directory with added,
    • removed and modified big files.
  • 'hg bupdate' - to fetch files from versions directory as recorded in
    • '.bigfiles', and get a list of necessary files missing in the version directory.

Configuration

Configure your .hgrc to enable the extension by adding following lines:

[extensions]
bigfiles = path/to/bigfiles.py

[bigfiles]
repo = path/to/versions/dir 

Sample Usage Session

# Checking in a big file to a new repo:
C:\tmp>ls -sh
total 9.8M
9.8M windows-essential-20071007.zip

C:\tmp>hg init

C:\tmp>hg stat
? windows-essential-20071007.zip

C:\tmp>hg add
adding windows-essential-20071007.zip
windows-essential-20071007.zip: files over 10MB may cause memory and 
performance problems
(use 'hg revert windows-essential-20071007.zip' to unadd the file)
# The warning from hg means that the file can be controlled by 'bigfiles'

# Get a reminder that we need to specify which dir to use to store the 
big files.
C:\tmp>hg bstat
abort: bigfiles.repo path not configured

# Let's create a repo for the big files in a new dir somewhere.
C:\tmp>mkdir ..\tmp_bigrepo

C:\tmp>echo [bigfiles] > .hg\hgrc
C:\tmp>echo repo =c:/tmp_bigrepo >> .hg\hgrc
C:\tmp>cat .hg\hgrc
[bigfiles]
repo =c:/tmp_bigrepo

# Bstat shows that a big file is about to be added.
C:\tmp>hg bstat
A windows-essential-20071007.zip

# Put it under control of 'bigfiles'
C:\tmp>hg bref
forgetting windows-essential-20071007.zip

# now a file .bigfiles is created
C:\tmp>ls .bigfiles
.bigfiles

C:\tmp>cat .bigfiles
syntax: glob

windows-essential-20071007.zip#fc8fb93abeb53fe301594fe6463c0ac2436c59f8

# We want to keep '.bigfiles' revisioned by hg
C:\tmp>hg add
adding .bigfiles

# Complete checking in of the big file - hg only stores '.bigfile', not 
# the windows-essential-20071007.zip
C:\tmp>hg ci -m "Added a big file"

# The big file is actually versioned in the big files repo
C:\tmp>ls ../tmp_bigrepo
windows-essential-20071007.zip.fc8fb93abeb53fe301594fe6463c0ac2436c59f8

# The big file gets some modification
C:\tmp>echo "qqq" >> windows-essential-20071007.zip

C:\tmp>hg bstat
M windows-essential-20071007.zip

# Put modified file under control of 'bigfiles'
C:\tmp>hg bref

C:\tmp>hg stat
M .bigfiles

C:\tmp>hg diff
diff --git a/.bigfiles b/.bigfiles
--- a/.bigfiles
+++ b/.bigfiles
@@ -1,3 +1,3 @@
 syntax: glob

-windows-essential-20071007.zip#fc8fb93abeb53fe301594fe6463c0ac2436c59f8
+windows-essential-20071007.zip#4187f81bc4fe5fe6c9586a8481cec4179ac63aa0

C:\tmp>hg ci -m "Modified a big file"

C:\tmp>ls ../tmp_bigrepo
windows-essential-20071007.zip.4187f81bc4fe5fe6c9586a8481cec4179ac63aa0
windows-essential-20071007.zip.fc8fb93abeb53fe301594fe6463c0ac2436c59f8

C:\tmp>hg log
changeset:   1:4e312e37f18c
tag:         tip
user:        Andrei Vermel <avermel@mail.ru>
date:        Thu Sep 24 23:49:23 2009 +0400
summary:     Modified a big file

changeset:   0:f6eafba99057
user:        Andrei Vermel <avermel@mail.ru>
date:        Thu Sep 24 23:47:37 2009 +0400
summary:     Added a big file

# Check out the previous revision - the big file gets fetched from the repo
C:\tmp>hg co -r 0
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
fetching 
windows-essential-20071007.zip.fc8fb93abeb53fe301594fe6463c0ac2436c59f8

You may also have interest in LargefilesExtension and SnapExtension.


CategoryExtensionsByOthers

BigfilesExtension (last edited 2012-02-15 19:21:08 by ks3095497)