Size: 6158
Comment:
|
Size: 1735
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 8: | Line 8: |
IRC : vsh | Irc : vsh |
Line 10: | Line 10: |
GSoC application: | GSoC proposal: The updated version is available at http://bitbucket.org/vsh/shallow-proposal/src |
Line 12: | Line 13: |
== Shallow Cloning in Mercurial [GSoC proposal] == :Author: Vishakh Harikumar <vsh426@gmail.com> :Description: Google Summer of Code proposal to work on Mercurial Shallow Clone feature |
mq: http://bitbucket.org/vsh/hg-shallow-clone/ |
Line 16: | Line 15: |
=== Abstract === | = Journal = |
Line 18: | Line 17: |
The Shallow Cloning proposal is regarding adding support for shallow cloning in Mercurial. This feature will allow cloning most recent parts of [large] repositories in situations constrained by limits on resources such as storage space and network bandwidth and reliability, preventing creation of a full clone. |
100616 write script to get size stats of revlog in a repo, look into discovery.py |
Line 23: | Line 19: |
=== Introduction === | 100617 investigate consequences of pruning revisions. |
Line 25: | Line 21: |
Mercurial is widely used by people and organizations as their tool for version control. Many large repositories are managed by it. The drawback is that anybody who wants to work with the repository has to clone the repository in its entirety. The use cases for similar situations boil down to cloning limited subset of the complete repository from a particular revision aka the shallow clone. |
100618 look into performance issue. |
Line 31: | Line 23: |
The shallow clone should work seamlessly with other other clones, which may be full or shallow, when performing push or pull operations. When earlier history is required it should be possible to deepen the clone by retrieving earlier revisions. Guidelines for the implementation are in the Shallow Clone Plan[1] and will also include discussions with the rest of the community to flesh out details. |
100619-100621 fix punching in pull and changegroupsubset in localrepo |
Line 37: | Line 25: |
=== Goals === | 100621 fix revlog to create punched groups |
Line 39: | Line 27: |
The goals I see for the project are: * Implementing Trimming History * Creation of local Shallow Clones * Push, Pull for local Shallow Clones * Tests to define Shallow Clones * Support deepening of Shallow Clones * Update bundle format and wire protocol * Additional tests for network clones |
100622 working on making revlog accept punched revisions |
Line 48: | Line 29: |
===== Trimming History ===== | ---- |
Line 50: | Line 31: |
Trimming of history will allow removing unwanted history from the repository from individual revisions and ranges, to entire branches. I plan to implement this using the punch approach as described in the wiki[2]. This involves removing deltas from the datafile and updating its length in the indexfile to -1. Problems to solve in the approach are situations where deltas might not patch correctly and making sure hg itself is aware of the trimmed history. Trimming will allow the size of the repository to be reduced and keep only parts of the history that are needed. |
100623 add flags to record punched revisions |
Line 58: | Line 33: |
===== Creation of local Shallow Clones ===== | 100624 fix up revlog.addgroup to write punched revisions |
Line 60: | Line 35: |
Local Shallow Cloning will work by keeping the complete changelog while truncating and using the trimming command to remove all history from manifests and file logs before that of the shallow root. This phase will also involve making decisions about mercurial's view of shallow clones, such as the storage of the full version and the deltas of the text, and modification to revlog and bundle format to support shallow clones. Tests at this stage will be defining the structure of the clone and used for regression testing as more goals are added. |
100625 - |
Line 68: | Line 37: |
===== Push, Pull and Bundle local Repos ===== | 100626 get simple shallow clone running and investigate performance |
Line 70: | Line 39: |
[TODO] | 100627 look at some bugs in bts |
Line 72: | Line 41: |
===== Tests to define Shallow Clones ===== | 100628 fix naive performance issues in current code, patch for mq issue |
Line 74: | Line 43: |
At this point shallow cloning of local repository will be complete. I will write additional tests to exercise all possible cases. A comprehensive test suite will define all the functions of shallow clones and can further be used to test shallow clones that have been created over the network. |
100629 |
Line 79: | Line 45: |
===== Support Shallow Cloning over network ===== | ---- |
Line 81: | Line 47: |
Cloning over networks is done with the wire protocol. It does not currently support shallow cloning, since it cannot work with individual changesets ,only a stream of changegroups encoded in the bundle format. First I will update the bundle format to inlclude enough information to create shallow clone at given revision. This will be useful in the wire protocol. There already exists a plan for updating the wire protocol. I will be coordinating with others working on the same, and add support for shallow clones. This will enable shallow cloning over networks as well. |
100630 - |
Line 89: | Line 49: |
===== Additional tests for Network Shallow Clones ===== | 100701 redo all patches by splitting them. |
Line 91: | Line 51: |
Write tests for wire protocol, bundle format and network clones. This should complete the test suite for Shallow clones. I will also be updating the wiki and help to cover all aspects of shallow clones. |
100702 stumped by error after redo, till i found i lost some code during splitting. used an mq on the mq to fix it (hopefully not be doing that anytime soon) |
Line 95: | Line 53: |
=== Timeline === | 100703 changegroupsubset is too slow for whole repo, while doing the collecting in _changegroup in rather redundant. move code to _changegroup for now to improve performance, while considering other options to improve situation including adding fastpath to changegroupsubset. |
Line 97: | Line 55: |
I am working through the details of shallow clones and will probably start coding it before the official start date of the program. I have my final exams in the first 2 weeks of May. The rest of the time I should be able to concentrate on Shallow cloning. |
100704 looked into: wire protocol, bugs from bts, review patches from ml. |
Line 102: | Line 57: |
* Implementing Trimming History [ 2 weeks ] * Creation of local Shallow Clones [ 1 week ] * Push, Pull for local Shallow Clones [ 1.5 weeks ] * Tests for local Shallow Clones [ .5 weeks ] * Support deepening of Shallow Clones [ 1.5 weeks ] * Update bundle format and wire protocol[ 1 week ] * Shallow cloning over network [ 2 weeks ] * Additional tests/ing for network clones[ 1 week ] === About === I am a final year BTech student at MPSTME, India. I have written programs in C, Basic and short stints with Java and Visual Basic(they made me do it :). Currently most of my programming is in Python. I discovered Mercurial over a year ago and have been using it for all my projects since. I have read through earliest commits in mercurial repo when I found mercurial and in the process gained a better understanding of its internals. I have since read through many modules in tip, for a better understanding of shallow cloning as well. I intend to make contributions to Mercurial in the future,via GSoC or otherwise. This document and all related work are available at http://bitbucket.org/vsh/hg-shallow/ Contact Information * Email: vsh426@gmail.com * IRC: vsh === References === [1] http://mercurial.selenic.com/wiki/ShallowClonePlan [2] http://mercurial.selenic.com/wiki/TrimmingHistory |
100705 spent the day [with attempts] fixing some bugs. sent a patch for issue1881 and some unsatisfying fixes for others. |
Vishakh Harikumar
Email: <vsh426 AT SPAMFREE gmail DOT com>
Irc : vsh
GSoC proposal: The updated version is available at http://bitbucket.org/vsh/shallow-proposal/src
mq: http://bitbucket.org/vsh/hg-shallow-clone/
Journal
100616 write script to get size stats of revlog in a repo, look into discovery.py
100617 investigate consequences of pruning revisions.
100618 look into performance issue.
100619-100621 fix punching in pull and changegroupsubset in localrepo
100621 fix revlog to create punched groups
100622 working on making revlog accept punched revisions
100623 add flags to record punched revisions
100624 fix up revlog.addgroup to write punched revisions
100625 -
100626 get simple shallow clone running and investigate performance
100627 look at some bugs in bts
100628 fix naive performance issues in current code, patch for mq issue
100629
100630 -
100701 redo all patches by splitting them.
100702 stumped by error after redo, till i found i lost some code during splitting. used an mq on the mq to fix it (hopefully not be doing that anytime soon)
100703 changegroupsubset is too slow for whole repo, while doing the collecting in _changegroup in rather redundant. move code to _changegroup for now to improve performance, while considering other options to improve situation including adding fastpath to changegroupsubset.
100704 looked into: wire protocol, bugs from bts, review patches from ml.
100705 spent the day [with attempts] fixing some bugs. sent a patch for issue1881 and some unsatisfying fixes for others.