Differences between revisions 7 and 8
Revision 7 as of 2009-11-06 14:53:06
Size: 3708
Editor: JesseGlick
Comment: Promoted from a contrib script to an extension.
Revision 8 as of 2009-11-06 14:59:10
Size: 1186
Editor: JesseGlick
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:
=== Recreate hardlinks between two Mercurial repositories === === Relink Extension ===
Line 7: Line 7:
Also, pulling with {{{--rev}}} never uses hardlinks.
Line 8: Line 9:
Here's a quick and dirty way to recreate those hardlinks and reclaim that wasted space (this script is also available as {{{contrib/hg-relink}}} in the source tarball, and see [[http://mercurial.selenic.com/bts/issue919|Issue919]] for a proposed {{{hg relink}}} command): You can recreate those hardlinks and reclaim that wasted space using the Relink Extension. Enable it:
Line 11: Line 12:
#!/usr/bin/env python [extensions]
relink =
# optionally specify the origin, if you do this a lot:
[paths]
default-relink = ../incoming
}}}
Line 13: Line 19:
import os, sys and run it:
Line 15: Line 21:
class ConfigError(Exception): pass {{{
$ hg relink
relinking /home/hacker/src/incoming to /home/hacker/src/work
collected 9999 candidate storage files
pruned down to 5555 probably relinkable files
relinked 4444 files (12345678 bytes reclaimed)
}}}
Line 17: Line 29:
def usage():
    print """relink <source> <destination>
    Hard-link files from source to destination"""
In Mercurial 1.3.1 and older (prior to [[http://mercurial.selenic.com/bts/issue919|Issue919]]), {{{contrib/hg-relink}}} in the source tarball can be used for the same purpose.
Line 21: Line 31:
class Config:
    def __init__(self, args):
        if len(args) != 3:
            raise ConfigError("wrong number of arguments")
        self.src = os.path.abspath(args[1])
        self.dst = os.path.abspath(args[2])
        for d in (self.src, self.dst):
            if not os.path.exists(os.path.join(d, '.hg')):
                raise ConfigError("%s: not a mercurial repository" % d)

try:
    cfg = Config(sys.argv)
except ConfigError, inst:
    print str(inst)
    usage()
    sys.exit(1)

relinked = 0
savedbytes = 0
CHUNKLEN = 4096

def collect(src):
    seplen = len(os.path.sep)
    candidates = []
    for dirpath, dirnames, filenames in os.walk(src):
        relpath = dirpath[len(src) + seplen:]
        for filename in filenames:
            if not (filename.endswith('.i') or filename.endswith('.d')):
                continue
            st = os.stat(os.path.join(dirpath, filename))
            candidates.append((os.path.join(relpath, filename), st))

    return candidates

def prune(candidates, dst):
    targets = []
    for fn, st in candidates:
        tgt = os.path.join(dst, fn)
        try:
            ts = os.stat(tgt)
        except OSError:
            # Destination doesn't have this file?
            continue
        if st.st_ino == ts.st_ino:
            continue
        if st.st_dev != ts.st_dev:
            raise Exception('Source and destination are on different devices')
        if st.st_size != ts.st_size:
            continue
        targets.append((fn, ts.st_size))

    return targets

def relink(src, dst, files):
    CHUNKLEN = 65536
    relinked = 0
    savedbytes = 0

    for f, sz in files:
        source = os.path.join(src, f)
        tgt = os.path.join(dst, f)
        sfp = file(source)
        dfp = file(tgt)
        sin = sfp.read(CHUNKLEN)
        while sin:
            din = dfp.read(CHUNKLEN)
            if sin != din:
                break
            sin = sfp.read(CHUNKLEN)
        if sin:
            continue
        try:
            os.rename(tgt, tgt + '.bak')
            try:
                os.link(source, tgt)
            except OSError:
                os.rename(tgt + '.bak', tgt)
                raise
            print 'Relinked %s' % f
            relinked += 1
            savedbytes += sz
            os.remove(tgt + '.bak')
        except OSError, inst:
            print '%s: %s' % (tgt, str(inst))

    print 'Relinked %d files (%d bytes reclaimed)' % (relinked, savedbytes)

src = os.path.join(cfg.src, '.hg')
dst = os.path.join(cfg.dst, '.hg')
candidates = collect(src)
targets = prune(candidates, dst)
relink(src, dst, targets)
}}}
Line 115: Line 32:
CategoryTipsAndTricks CategoryExtension

When repositories are cloned locally, their data files will be hardlinked so that they only use the space of a single repository.

Unfortunately, subsequent pulls into either repository will break hardlinks for any files touched by the new changesets, even if both repositories end up pulling the same changes. Also, pulling with --rev never uses hardlinks.

You can recreate those hardlinks and reclaim that wasted space using the Relink Extension. Enable it:

[extensions]
relink =
# optionally specify the origin, if you do this a lot:
[paths]
default-relink = ../incoming

and run it:

$ hg relink
relinking /home/hacker/src/incoming to /home/hacker/src/work
collected 9999 candidate storage files
pruned down to 5555 probably relinkable files
relinked 4444 files (12345678 bytes reclaimed)

In Mercurial 1.3.1 and older (prior to Issue919), contrib/hg-relink in the source tarball can be used for the same purpose.


CategoryExtension

RelinkExtension (last edited 2020-05-30 04:17:27 by aayjaychan)