Differences between revisions 2 and 11 (spanning 9 versions)
Revision 2 as of 2010-04-10 17:26:14
Size: 6158
Comment:
Revision 11 as of 2010-07-06 08:13:55
Size: 1735
Comment:
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
IRC : vsh Irc : vsh
Line 10: Line 10:
GSoC application: GSoC proposal:
The updated version is available at http://bitbucket.org/vsh/shallow-proposal/src
Line 12: Line 13:
== Shallow Cloning in Mercurial [GSoC proposal] ==
:Author: Vishakh Harikumar <vsh426@gmail.com>
:Description: Google Summer of Code proposal to work on Mercurial Shallow Clone feature
mq: http://bitbucket.org/vsh/hg-shallow-clone/
Line 16: Line 15:
=== Abstract === = Journal =
Line 18: Line 17:
The Shallow Cloning proposal is regarding adding support for shallow cloning in
Mercurial. This feature will allow cloning most recent parts of [large] repositories
in situations constrained by limits on resources such as storage space and
network bandwidth and reliability, preventing creation of a full clone.
100616 write script to get size stats of revlog in a repo, look into discovery.py
Line 23: Line 19:
=== Introduction === 100617 investigate consequences of pruning revisions.
Line 25: Line 21:
Mercurial is widely used by people and organizations as their tool for version
control. Many large repositories are managed by it. The drawback is that
anybody who wants to work with the repository has to clone the repository in its
entirety. The use cases for similar situations boil down to cloning limited
subset of the complete repository from a particular revision aka the shallow clone.
100618 look into performance issue.
Line 31: Line 23:
The shallow clone should work seamlessly with other other clones, which may be
full or shallow, when performing push or pull operations. When earlier history is
required it should be possible to deepen the clone by retrieving earlier revisions.
Guidelines for the implementation are in the Shallow Clone Plan[1] and will
also include discussions with the rest of the community to flesh out details.
100619-100621 fix punching in pull and changegroupsubset in localrepo
Line 37: Line 25:
=== Goals === 100621 fix revlog to create punched groups
Line 39: Line 27:
The goals I see for the project are:
    * Implementing Trimming History
    * Creation of local Shallow Clones
    * Push, Pull for local Shallow Clones
    * Tests to define Shallow Clones
    * Support deepening of Shallow Clones
    * Update bundle format and wire protocol
    * Additional tests for network clones
100622 working on making revlog accept punched revisions
Line 48: Line 29:
===== Trimming History ===== ----
Line 50: Line 31:
Trimming of history will allow removing unwanted history from the repository from
individual revisions and ranges, to entire branches. I plan to implement this using
the punch approach as described in the wiki[2]. This involves removing deltas from the
datafile and updating its length in the indexfile to -1. Problems to solve in the
approach are situations where deltas might not patch correctly and making sure hg
itself is aware of the trimmed history. Trimming will allow the size of the
repository to be reduced and keep only parts of the history that are needed.
100623 add flags to record punched revisions
Line 58: Line 33:
===== Creation of local Shallow Clones ===== 100624 fix up revlog.addgroup to write punched revisions
Line 60: Line 35:
Local Shallow Cloning will work by keeping the complete changelog while truncating
and using the trimming command to remove all history from manifests and file logs
before that of the shallow root. This phase will also involve making decisions
about mercurial's view of shallow clones, such as the storage of the full version
and the deltas of the text, and modification to revlog and bundle format to support
shallow clones. Tests at this stage will be defining the structure of the clone
and used for regression testing as more goals are added.
100625 -
Line 68: Line 37:
===== Push, Pull and Bundle local Repos ===== 100626 get simple shallow clone running and investigate performance
Line 70: Line 39:
[TODO] 100627 look at some bugs in bts
Line 72: Line 41:
===== Tests to define Shallow Clones ===== 100628 fix naive performance issues in current code, patch for mq issue
Line 74: Line 43:
At this point shallow cloning of local repository will be complete. I will write
additional tests to exercise all possible cases. A comprehensive test suite will
define all the functions of shallow clones and can further be used to test shallow
clones that have been created over the network.
100629
Line 79: Line 45:
===== Support Shallow Cloning over network ===== ----
Line 81: Line 47:
Cloning over networks is done with the wire protocol. It does not currently support
shallow cloning, since it cannot work with individual changesets ,only a stream of
changegroups encoded in the bundle format. First I will update the bundle format
to inlclude enough information to create shallow clone at given revision. This will
be useful in the wire protocol. There already exists a plan for updating the wire
protocol. I will be coordinating with others working on the same, and add support
for shallow clones. This will enable shallow cloning over networks as well.
100630 -
Line 89: Line 49:
===== Additional tests for Network Shallow Clones ===== 100701 redo all patches by splitting them.
Line 91: Line 51:
Write tests for wire protocol, bundle format and network clones. This should
complete the test suite for Shallow clones. I will also be updating the wiki
and help to cover all aspects of shallow clones.
100702 stumped by error after redo, till i found i lost some code during splitting. used an mq on the mq to fix it (hopefully not be doing that anytime soon)
Line 95: Line 53:
=== Timeline === 100703 changegroupsubset is too slow for whole repo, while doing the collecting in _changegroup in rather redundant. move code to _changegroup for now to improve performance, while considering other options to improve situation including adding fastpath to changegroupsubset.
Line 97: Line 55:
I am working through the details of shallow clones and will probably start
coding it before the official start date of the program. I have my final exams
in the first 2 weeks of May. The rest of the time I should be able to concentrate
on Shallow cloning.
100704 looked into: wire protocol, bugs from bts, review patches from ml.
Line 102: Line 57:
    * Implementing Trimming History [ 2 weeks ]
    * Creation of local Shallow Clones [ 1 week ]
    * Push, Pull for local Shallow Clones [ 1.5 weeks ]
    * Tests for local Shallow Clones [ .5 weeks ]
    * Support deepening of Shallow Clones [ 1.5 weeks ]
    * Update bundle format and wire protocol[ 1 week ]
    * Shallow cloning over network [ 2 weeks ]
    * Additional tests/ing for network clones[ 1 week ]

=== About ===

I am a final year BTech student at MPSTME, India. I have written programs in C,
Basic and short stints with Java and Visual Basic(they made me do it :). Currently
most of my programming is in Python. I discovered Mercurial over a year ago and
have been using it for all my projects since. I have read through earliest commits in
mercurial repo when I found mercurial and in the process gained a better understanding
of its internals. I have since read through many modules in tip, for a better
understanding of shallow cloning as well. I intend to make contributions to Mercurial
in the future,via GSoC or otherwise.

This document and all related work are available at http://bitbucket.org/vsh/hg-shallow/

Contact Information

    * Email: vsh426@gmail.com
    * IRC: vsh

=== References ===

[1] http://mercurial.selenic.com/wiki/ShallowClonePlan
[2] http://mercurial.selenic.com/wiki/TrimmingHistory
100705 spent the day [with attempts] fixing some bugs. sent a patch for issue1881 and some unsatisfying fixes for others.

Vishakh Harikumar

Email: <vsh426 AT SPAMFREE gmail DOT com>

Irc : vsh

GSoC proposal: The updated version is available at http://bitbucket.org/vsh/shallow-proposal/src

mq: http://bitbucket.org/vsh/hg-shallow-clone/

Journal

100616 write script to get size stats of revlog in a repo, look into discovery.py

100617 investigate consequences of pruning revisions.

100618 look into performance issue.

100619-100621 fix punching in pull and changegroupsubset in localrepo

100621 fix revlog to create punched groups

100622 working on making revlog accept punched revisions


100623 add flags to record punched revisions

100624 fix up revlog.addgroup to write punched revisions

100625 -

100626 get simple shallow clone running and investigate performance

100627 look at some bugs in bts

100628 fix naive performance issues in current code, patch for mq issue

100629


100630 -

100701 redo all patches by splitting them.

100702 stumped by error after redo, till i found i lost some code during splitting. used an mq on the mq to fix it (hopefully not be doing that anytime soon)

100703 changegroupsubset is too slow for whole repo, while doing the collecting in _changegroup in rather redundant. move code to _changegroup for now to improve performance, while considering other options to improve situation including adding fastpath to changegroupsubset.

100704 looked into: wire protocol, bugs from bts, review patches from ml.

100705 spent the day [with attempts] fixing some bugs. sent a patch for issue1881 and some unsatisfying fixes for others.


CategoryHomepage

VishakhHarikumar (last edited 2010-08-03 17:10:13 by VishakhHarikumar)