Differences between revisions 25 and 50 (spanning 25 versions)
Revision 25 as of 2010-04-18 15:20:52
Size: 5504
Editor: Pradeepkumar
Comment:
Revision 50 as of 2010-06-22 18:16:48
Size: 7194
Editor: Pradeepkumar
Comment:
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:
##master-page:HomepageTemplate
#format wiki
#language en
Email: <<MailTo(in3xes AT SPAMFREE gmail DOT com)>>

IRC Nick: inXs_

'''About Me:'''

- Student at [[http://www.iitr.ac.in|IIT]] in India. Mathematics is my major.

- Fan of C, Python, Haskell. You can find some of my small projects on [[http://www.github.com/in3Xes|Github]]. -

- Love solving mathematics problems by programming. [[http://www.github.com/in3Xes/projecteuler|Solutions]] for the problems in [[http://www.projecteuler.net|Project Euler]]. Solved in python.

- Experienced user of git. Comfortable with Bazaar.

- Find me on [[http://www.twitter.com/in3Xes|Twitter]]

'''Google Summer of Code-2010'''
## page was renamed from in3xes
## page was renamed from Pradeepkumar
== Google Summer of Code-2010 ==
Line 28: Line 10:
============================================================================

'''Proposal'''
----
=== Proposal ===
Line 39: Line 19:
 . Change revlog structure, now it appends the next delta to existing data.This helps in reconstrfucting a revision at O(1) seeks. If  . Change revlog structure, now it appends the next delta to existing data. This helps in reconstructing a revision at O(1) seeks. If
Line 44: Line 24:
  . --> O(1) seeks to create any revision.   . * O(1) seeks to create any revision.
Line 47: Line 28:
  . --> An intelligent algorithm to reduce number of seeks to reconstruct any revision.I think constructing revision is   . * An intelligent algorithm to reduce number of seeks to reconstruct any revision.I think constructing revision is not straight forward as it is now. It's not going to be linear.Right now constructing a revision is simply applying each block of data to a base till the revision required.
  . * Investigation about merge revision.At merge revision node has two parents, delta should be taken against first or
Line 49: Line 31:
 . not straight forward as it is now. It's not going to be linear.
  . --> Investigation about merge revision. -->
Optimization, and investigation for better implementation(If possible). --> Continuous parallel testing.
 . * Optimization, and investigation for better implementation(If possible).
Line 55: Line 36:
 * Implementation of changes according to old, new servers,client, repos.
  . --> Backward computability matrix (by tonfa): http://mercurial.selenic.com/wiki/ParentDeltaPlan
   . --> There are things to be discussed, and learned before implementation.
 * Implementation of changes according to old, new servers, client, repos.
  . *
 
Backward compatibility matrix (by tonfa): http://mercurial.selenic.com/wiki/ParentDeltaPlan
Line 63: Line 45:
 . I am student at Indian Institute of Technology, Roorkee. I am currently pursing third year of 5 years degree, Mathematics as major. I  . I am student at Indian Institute of Technology, Roorkee. I am currently pursing third year of 5 years degree, Mathematics as major. I am a Linux enthusiast. Interested in open source. I have been learning C, Python. Some of my projects can be found on github ( http://www.github.com/in3Xes ). I have never got opportunity to contribute to open source. GSoC would be stepping stone for that.
Line 65: Line 47:
am a Linux enthusiast. Interested in open source. I have been learning C,Python. Some of my projects can be found on github ( http://www.github.com/in3Xes ). I have never got opportunity to contribute to open source. GSoC would be stepping stone for that.

. I am project leader in IMG ( Information Management Group, IIT Roorkee, http://www.iitr.ac.in/IMG/ ), a group which manages intranet,

 .
student databases, institute website. I have worked on PHP, Java in web development.
 . I am project leader in IMG ( Information Management Group, IIT Roorkee, http://www.iitr.ac.in/IMG/ ), a group which manages intranet, student databases, institute website. I have worked on PHP, Java in web development.
Line 75: Line 53:
 . IRC Nick:  inXs_ (freenode.net)
. IRC Nick: in3xes (freenode.net)
Line 80: Line 59:
 . I would start coding well before the schedule given in the time line. My final exams will end by May-15th. I can start working  . I would start coding well before the schedule given in the time line. My final exams will end by May-15th. I can start working rigorously shortly after exams. Till May-15th(or 24th according to time line) I will be reading code, documentation, getting used to coding style, getting ready to start coding, little experimentation etc.
Line 82: Line 61:
 . rigorously shortly after exams.
  . Till May-15th(or 24th according to time line) I will be reading code, documentation, getting used to coding style, getting ready to
 start coding, little experimentation and stuff.
'''Timeline'''
Line 86: Line 63:
My plan of time allotment: Changes in revlog structure: [Total 3 weeks] ''Changes in revlog structure'': [Total 3 weeks]
Line 89: Line 66:
  . --> An intelligent algorithm to reduce number of seeks to reconstruct any revision.I think constructing revision is not   . * An intelligent algorithm to reduce number of seeks to reconstruct any revision.I think constructing revision is not straight forward as it is now. It's not going to be linear.
Line 91: Line 68:
 . straight forward as it is now. It's not going to be linear.
  . [(It can take 1 1/2 to 2 weeks to bring it to working stage, Roughly around 5th-Jun)]
  . --> Investigation about merge revision. --> Optimization, and investigation for better implementation(If possible).
   . --> Continuous parallel testing.
    . [( I would finish testing and other work nearly in 10 days, by 15th-Jun)]
 . [(It can take 1 1/2 to 2 weeks to bring it to working stage, Roughly around 5th-Jun)]
Line 97: Line 70:
Changes in wire protocol:[Total 3 to 3 1/2 weeks]  . * Investigation about merge revision.
 . * Optimization, and investigation for better implementation(If possible)
 . * Continuous parallel testing.
  . [( I would finish testing and other work nearly in 10 days, by 15th-Jun)]

''Changes in wire protocol'':[Total 3 to 3 1/2 weeks]
Line 101: Line 79:
  . --> Implementation of changes according to old, new servers,client, repos.
   . >> Backward compatibility matrix (by tonfa): http://mercurial.selenic.com/wiki/ParentDeltaPlan . >> There are things to be discussed, and learned before implementation.
  . * Implementation of changes according to old, new servers,client, repos.
   . *
   *
Backward compatibility matrix (by tonfa): http://mercurial.selenic.com/wiki/ParentDeltaPlan .
Line 105: Line 84:
  . [(I will start testing shortly after mid-term evaluation, i.e., around 18th-July, would take a week to complete)]
  * Integrating with other's work.
   .
[(Changeset discovery, Lightweight copies/renames). ( Remaining time, depending upon other students)]
  . * [(I will start testing shortly after mid-term evaluation, i.e., around 18th-July, would take a week to complete)]
  * * Integrating with other's work. [ ( Remaining time, depending upon other students)]
Line 111: Line 89:
## You can even more obfuscate your email address by adding more uppercase letters followed by a leading and trailing blank. ----
''' Work Progress:'''

''''''

''' '''Few things related revlog:

1) A [[http://markmail.org/download.xqy?id=ooqvjdaor4t3sxan&number=1|script]] by tonfa, proof for parent delta.

2) mpm [[http://markmail.org/thread/ooqvjdaor4t3sxan|explaining]] how space inefficiency is caused.

3) A [[http://mercurial.selenic.com/hg/hg/file/45aabc523c86/contrib/shrink-revlog.py|python script]] in mercurial repo to shrink revlog by sorting in topological order.

Earlier in this week I started working on revlog, most of the work is already done by tonfa. He shared patches. All I did was understand what he did, and rewrite those patches with minor changes.

'''strategy:'''

As mpm suggested, I have changed the way new revision is being added to existing revlog, and make corresponding changes in creating a given revision of file. The crucial step (performance affecting step) is creating a given version of file. previously to create a given version of a file first we take base revision , to that we keep adding the deltas till given revision. Earlier deltas are added in linear manner. So creating a given revision of file is simply seeking O(1) times.

''Defination of base in revlog is not given anywhere. From revlog, a base is the version where we add full text of file to the revlog instead of delta.''

example scenario:

We want to create a R1 revision of a file. The parents of R1 are P1 and P2. Base of R1 is B1.

Consider a situation where P2 is null, and P1 < B1. In this case I simply can't take B1 and create R1. I have first find base and parents of P1 and construct P1. The same case might repeat. where parent of P1 is < base of P1. So, first I have to find a base which is ancestor of all the parents, then create a chain of deltas from that base to construct a given revision.

Output of --profile says that, what ever the decrease in the performance is caused only by this function which used for constructing the delta chain. So, I need to find a better algorithm to construct the delta chain. The current algorithm for constructing the delta chain is.

{{{
    def deltachain(self, rev):
        chain = []
        while self.base(rev) != rev:
            chain.append(rev)
            rev = self.deltaparent(rev)
        chain.reverse()
        return rev, chain

    def deltaparent(self, rev):
        if self.base(rev) == rev:
            return nullrev
        elif self.flags(rev) & REVLOG_PARENTDELTA_FLAGS:
            return self.parentrevs(rev)[0]
        else:
            return rev-1
}}}

Google Summer of Code-2010

I am interested in "Parent delta". I can also work on conversion tools. But I am mainly focusing on Parent Delta Plan.

A word about project:

Mercurial calculates diffs against previous revision rather than parent. In some cases this implementation is space inefficient, and it is more sensible to store deltas against parent. This project is about implementing Parent Delta Plan. This implementation changes the structure of revlogs, so wire protocol has to be extended to allow backward compatibility.


Proposal

TO DO(Tentative): Major:

  • Changes in revlog
  • Changes in wire protocol

Changes in revlog:

  • Change revlog structure, now it appends the next delta to existing data. This helps in reconstructing a revision at O(1) seeks. If

parent delta is implemented better compression can be achieved but, may have to compromise with number of seeks.

  • Current:Linear delta model

    • * O(1) seeks to create any revision.
  • Parent delta model
    • ** I think this can be completed quickly, tonfa shared patches for this***
    • * An intelligent algorithm to reduce number of seeks to reconstruct any revision.I think constructing revision is not straight forward as it is now. It's not going to be linear.Right now constructing a revision is simply applying each block of data to a base till the revision required.
    • * Investigation about merge revision.At merge revision node has two parents, delta should be taken against first or
  • * Optimization, and investigation for better implementation(If possible).

Changes in wire protocol:

  • Changes in wire protocol to allow backward computability.
  • Implementation of changes according to old, new servers, client, repos.
  • Rigorous testing
  • Integrating with other's work.(Changeset discovery, Lightweight copies/renames)

About Me:

  • I am student at Indian Institute of Technology, Roorkee. I am currently pursing third year of 5 years degree, Mathematics as major. I am a Linux enthusiast. Interested in open source. I have been learning C, Python. Some of my projects can be found on github ( http://www.github.com/in3Xes ). I have never got opportunity to contribute to open source. GSoC would be stepping stone for that.

  • I am project leader in IMG ( Information Management Group, IIT Roorkee, http://www.iitr.ac.in/IMG/ ), a group which manages intranet, student databases, institute website. I have worked on PHP, Java in web development.

Contact Information:

  • Name: Pradeepkumar Gayam

    Email: in3xes@gmail.com

  • IRC Nick: in3xes (freenode.net)
  • Jabber ID: in3xes [AT] gmail.com (talk.google.com)

Schedule:

  • I would start coding well before the schedule given in the time line. My final exams will end by May-15th. I can start working rigorously shortly after exams. Till May-15th(or 24th according to time line) I will be reading code, documentation, getting used to coding style, getting ready to start coding, little experimentation etc.

Timeline

Changes in revlog structure: [Total 3 weeks]

  • Parent delta model:
    • * An intelligent algorithm to reduce number of seeks to reconstruct any revision.I think constructing revision is not straight forward as it is now. It's not going to be linear.
  • [(It can take 1 1/2 to 2 weeks to bring it to working stage, Roughly around 5th-Jun)]
  • * Investigation about merge revision.
  • * Optimization, and investigation for better implementation(If possible)
  • * Continuous parallel testing.
    • [( I would finish testing and other work nearly in 10 days, by 15th-Jun)]

Changes in wire protocol:[Total 3 to 3 1/2 weeks]

  • Changes in wire protocol to allow backward computability.
    • [(For midterm evaluation, this may not be complete, but significant part of it)]
    • * Implementation of changes according to old, new servers,client, repos.
  • Rigorous testing.
    • * [(I will start testing shortly after mid-term evaluation, i.e., around 18th-July, would take a week to complete)]
    • * Integrating with other's work. [ ( Remaining time, depending upon other students)]

Link to my GSoC application on bitbucket.org


Work Progress:

Few things related revlog:

1) A script by tonfa, proof for parent delta.

2) mpm explaining how space inefficiency is caused.

3) A python script in mercurial repo to shrink revlog by sorting in topological order.

Earlier in this week I started working on revlog, most of the work is already done by tonfa. He shared patches. All I did was understand what he did, and rewrite those patches with minor changes.

strategy:

As mpm suggested, I have changed the way new revision is being added to existing revlog, and make corresponding changes in creating a given revision of file. The crucial step (performance affecting step) is creating a given version of file. previously to create a given version of a file first we take base revision , to that we keep adding the deltas till given revision. Earlier deltas are added in linear manner. So creating a given revision of file is simply seeking O(1) times.

Defination of base in revlog is not given anywhere. From revlog, a base is the version where we add full text of file to the revlog instead of delta.

example scenario:

We want to create a R1 revision of a file. The parents of R1 are P1 and P2. Base of R1 is B1.

Consider a situation where P2 is null, and P1 < B1. In this case I simply can't take B1 and create R1. I have first find base and parents of P1 and construct P1. The same case might repeat. where parent of P1 is < base of P1. So, first I have to find a base which is ancestor of all the parents, then create a chain of deltas from that base to construct a given revision.

Output of --profile says that, what ever the decrease in the performance is caused only by this function which used for constructing the delta chain. So, I need to find a better algorithm to construct the delta chain. The current algorithm for constructing the delta chain is.

    def deltachain(self, rev):
        chain = []
        while self.base(rev) != rev:
            chain.append(rev)
            rev = self.deltaparent(rev)
        chain.reverse()
        return rev, chain

    def deltaparent(self, rev):
        if self.base(rev) == rev:
            return nullrev
        elif self.flags(rev) & REVLOG_PARENTDELTA_FLAGS:
            return self.parentrevs(rev)[0]
        else:
            return rev-1

CategoryHomepage

Pradeepkumar (last edited 2010-10-22 18:17:21 by mpm)