Differences between revisions 6 and 9 (spanning 3 versions)
Revision 6 as of 2009-10-29 18:38:16
Size: 3645
Editor: JesseGlick
Comment:
Revision 9 as of 2010-08-28 15:17:07
Size: 1191
Editor: abuehl
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
=== Recreate hardlinks between two Mercurial repositories === ## page was renamed from RecreateHardlinksBetweenRepositories
=== Relink Extension
===
Line 6: Line 7:
Also, pulling with {{{--rev}}} never uses hardlinks.
Line 7: Line 9:
Here's a quick and dirty way to recreate those hardlinks and reclaim that wasted space (this script is also available as {{{contrib/hg-relink}}} in the source tarball, and see [[http://mercurial.selenic.com/bts/issue919|Issue919]] for a proposed {{{hg relink}}} command): You can recreate those hardlinks and reclaim that wasted space using the Relink Extension. Enable it:
Line 10: Line 12:
#!/usr/bin/env python [extensions]
relink =
# optionally specify the origin, if you do this a lot:
[paths]
default-relink = ../incoming
}}}
Line 12: Line 19:
import os, sys and run it:
Line 14: Line 21:
class ConfigError(Exception): pass {{{
$ hg relink
relinking /home/hacker/src/incoming to /home/hacker/src/work
collected 9999 candidate storage files
pruned down to 5555 probably relinkable files
relinked 4444 files (12345678 bytes reclaimed)
}}}
Line 16: Line 29:
def usage():
    print """relink <source> <destination>
    Hard-link files from source to destination"""

class Config:
    def __init__(self, args):
        if len(args) != 3:
            raise ConfigError("wrong number of arguments")
        self.src = os.path.abspath(args[1])
        self.dst = os.path.abspath(args[2])
        for d in (self.src, self.dst):
            if not os.path.exists(os.path.join(d, '.hg')):
                raise ConfigError("%s: not a mercurial repository" % d)

try:
    cfg = Config(sys.argv)
except ConfigError, inst:
    print str(inst)
    usage()
    sys.exit(1)

relinked = 0
savedbytes = 0
CHUNKLEN = 4096

def collect(src):
    seplen = len(os.path.sep)
    candidates = []
    for dirpath, dirnames, filenames in os.walk(src):
        relpath = dirpath[len(src) + seplen:]
        for filename in filenames:
            if not (filename.endswith('.i') or filename.endswith('.d')):
                continue
            st = os.stat(os.path.join(dirpath, filename))
            candidates.append((os.path.join(relpath, filename), st))

    return candidates

def prune(candidates, dst):
    targets = []
    for fn, st in candidates:
        tgt = os.path.join(dst, fn)
        try:
            ts = os.stat(tgt)
        except OSError:
            # Destination doesn't have this file?
            continue
        if st.st_ino == ts.st_ino:
            continue
        if st.st_dev != ts.st_dev:
            raise Exception('Source and destination are on different devices')
        if st.st_size != ts.st_size:
            continue
        targets.append((fn, ts.st_size))

    return targets

def relink(src, dst, files):
    CHUNKLEN = 65536
    relinked = 0
    savedbytes = 0

    for f, sz in files:
        source = os.path.join(src, f)
        tgt = os.path.join(dst, f)
        sfp = file(source)
        dfp = file(tgt)
        sin = sfp.read(CHUNKLEN)
        while sin:
            din = dfp.read(CHUNKLEN)
            if sin != din:
                break
            sin = sfp.read(CHUNKLEN)
        if sin:
            continue
        try:
            os.rename(tgt, tgt + '.bak')
            try:
                os.link(source, tgt)
            except OSError:
                os.rename(tgt + '.bak', tgt)
                raise
            print 'Relinked %s' % f
            relinked += 1
            savedbytes += sz
            os.remove(tgt + '.bak')
        except OSError, inst:
            print '%s: %s' % (tgt, str(inst))

    print 'Relinked %d files (%d bytes reclaimed)' % (relinked, savedbytes)

src = os.path.join(cfg.src, '.hg')
dst = os.path.join(cfg.dst, '.hg')
candidates = collect(src)
targets = prune(candidates, dst)
relink(src, dst, targets)
}}}
In Mercurial 1.3.1 and older (prior to [[http://mercurial.selenic.com/bts/issue919|Issue919]]), {{{contrib/hg-relink}}} in the source tarball can be used for the same purpose.
Line 114: Line 31:
CategoryTipsAndTricks CategoryBundledExtension

When repositories are cloned locally, their data files will be hardlinked so that they only use the space of a single repository.

Unfortunately, subsequent pulls into either repository will break hardlinks for any files touched by the new changesets, even if both repositories end up pulling the same changes. Also, pulling with --rev never uses hardlinks.

You can recreate those hardlinks and reclaim that wasted space using the Relink Extension. Enable it:

[extensions]
relink =
# optionally specify the origin, if you do this a lot:
[paths]
default-relink = ../incoming

and run it:

$ hg relink
relinking /home/hacker/src/incoming to /home/hacker/src/work
collected 9999 candidate storage files
pruned down to 5555 probably relinkable files
relinked 4444 files (12345678 bytes reclaimed)

In Mercurial 1.3.1 and older (prior to Issue919), contrib/hg-relink in the source tarball can be used for the same purpose.


CategoryBundledExtension

RelinkExtension (last edited 2020-05-30 04:17:27 by aayjaychan)