Size: 8411
Comment:
|
Size: 16036
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 3: | Line 3: |
My concept of a repository is a linked list of revisions, each one composing the contents of the working directory at a particular moment in time. Changesets come in because for small changes, having only the information needed to transition between revisions takes up a lot less space than having both revisions. So for this article at least, a revision will be a full copy of all checked in files, and a changeset will be a diff between two revisions. | My concept of a repository is a linked list of revisions, each one composing the contents of the working directory at a particular moment in time. Changesets come in because for small changes, having only the information needed to transition between revisions takes up a lot less space than having both revisions. So for this article at least, a revision will be a full copy of all checked in files, and a changeset will be a diff between two revisions. Let me know if that's horribly wrong or missing some crucial detail. My idea is that you can have an arbitrary "first revision" that may have previous revisions to it, but you don't need to know that to do useful things with it. You should be able to push and pull changesets to that revision just as if it were an initial revision, even to repositories that treat it as an intermediate revision. These are the scenarios I can think of, with a rough explanation of how the two peers would interact. |
Line 16: | Line 18: |
If you have revision A, and all the changesets, then by applying them one at a time you can get to any of the revisions. | * If you have revision A, and all the changesets, then by applying them one at a time you can get to any of the revisions. * If you have revision A, revision B, and B->C, then by applying B->C to B you can get revision C. * If you have revision A, and A->C, but not anything about revision B, you can still get revision C by applying A->C to revision A. * If you have revisions A and C, you already have revision C, so you could check out either one without applying any changesets. B can't be checked out until you have either A->B or C->B. |
Line 18: | Line 23: |
If you have revision A, revision B, and B->C, then by applying B->C to B you can get revision C. If you have revision A, and A->C, but not anything about revision B, you can still get revision C by applying A->C to revision A. If you have revisions A and C, you already have revision C, so you could check out either one without applying any changesets. B can't be checked out until you have either A->B or C->B. It is worthy to note that revision D is exactly the same as revision C, so the two would end up having the same hash. Same with versions A and E. Effectively, once you attempted to calculate changeset B->current, it would recognize that the working directory hashes to C, and give you B->C, whereas if the hash does not already exist, it would give you B->D, and a new revision D. It is also worth noting that the revision D is identical to the revision C, but the changesets to reach it, A->C and B->C are ''not'' equivalent. Each would be a different diff, with a different hash. Changesets B->E and B->A would be the same, though in that case E would never exist, since you'd just use existing revision A. |
It is worthy to note that revision D is exactly the same as revision C, so the two would end up having the same hash, and thus ''cannot'' have different names. Same with versions A and E. Effectively, once you attempted to calculate changeset B->current, it would recognize that the working directory hashes to C, and give you B->C, whereas if the hash does not already exist, it would give you B->D, and a new revision D. It is also worth noting that while revision D is identical to revision C, the changesets to reach it, A->C and B->C are ''not'' equivalent. Each would be a different diff, with a different hash. Changesets B->E and B->A would be the same, though in that case E would never exist, since you'd just get existing revision A when you tried to create E. |
Line 35: | Line 34: |
Now, where's the initial revision of this repository? It's completely circular! You may as well say A is the root, or B, or C. That's the key to lazy fetching, that there ''is'' no initial revision to a repository. If you took the current revision from one repository, and the initial revision from another, you could make the former the parent of the latter, simply by computing the changeset from the first revision, to the last one. Suddenly your "initial revision" is not an initial revision. But despite rebasing in this fashion, the contents of the initial revision ''itself'' remain unchanged. | Now, where's the initial revision of this repository? It's completely circular! You may as well say A is the root, or B, or C. That's the key to lazy fetching, that there ''is'' no initial revision to a repository. If you took the current revision from one repository, and the initial revision from another, you could make the former the parent of the latter, simply by computing the changeset from the first revision, to the last one. Suddenly your "initial revision" is not an initial revision. But despite rebasing in this fashion, the contents of the initial revision ''itself'' remain unchanged, so it will have the same hash, and rebase to the same hash. I can arbitrarily say in that repository that A is the initial revision, but any of the revisions could be considered its initial state. |
Line 37: | Line 36: |
This is why it's important to have lazy fetching in any repository management system. Because what revision is the initial revision is arbitrary, and the concept of repository ignores that all revisions and changesets all fit in a single patch space. There's no reason you couldn't merge two totally different repositories, simply by making changesets between revisions they contain. Arbitrarily denying that functionality because the repositories don't start from the same revision just makes merging projects harder. In effect, every repository can be a branch of every other repository, and the model where it cannot is a flawed model that does not encompass all situations, or even all useful situations. Since the "initial" revision can ''always'' have previous revisions, that makes any log of revisions an unbounded list. And to deal with unbounded lists, you use lazy operations. Thus, lazy fetching is necessary to prevent an arbitrary limitation on repositories that encumbers everyone involved. ---- |
|
Line 44: | Line 47: |
and calls it a history of 4 changesets. | and calls it a history of 4 changesets. No revisions exist beyond the initial one, and calculating a working tree is applying those changesets to reach a particular revision. It might also have backwards changesets, so A<-B (B->A) instead of A->B. |
Line 46: | Line 49: |
If you just chop off the former part and have | * revision A - file foo.txt containing the line "foo" * changeset A->B - starting with revision A, add a line after "foo" containing "bar" * changeset A<-B - starting with revision B, delete the line "bar" * changeset A->C - starting with revision A, delete the line "foo" and add the line "bar" * changeset A<-C - starting with revision C, delete the line "bar" and add the line "foo" * changeset B->C - starting with revision B, delete the line "foo" * changeset B<-C - starting with revision C, add the line "foo" * changeset B->A - starting with revision B, delete the line "bar" * changeset B<-A - starting with revision A, add a line after "foo" containing "bar" That way the one single revision you keep could be A, B, or C, since you have changesets that move from any one revision to any other. So you could keep revision C for instance, and check it out immediately instead of having to apply the changesets to reach it from A. Obviously there are equivalent changesets, but I don't know if mercurial optimizes those out. So, taking the above pseudo-mercurial repository, if you just chop off the former part and have |
Line 50: | Line 65: |
You can't compute any sort of working directory. You don't have B, which is needed to apply any B-> changesets. And you can't get B since you have no revision A, and A->B. | ...then you can't compute any sort of working directory at all. You don't have B, which is needed to apply any B-> changesets. And you can't get B since for instance you have no revision A, and A->B, or C and C->B. |
Line 52: | Line 67: |
However, wherever you are fetching from that sends you B->C and B->A, it will either have revision B, or a way to compute it. Nobody needs to store any changesets for which the parent revision is missing. So when you pull instead of sending you A, A->B, B->C and B->A, the remote end can calculate B itself, by applying A->B to A. Then it can send you B, B->C and B->A, and you will be able to apply any of the latter changesets, by the fact you have a pristine copy of B. The remote end could keep B around or not, but the important part is it can send it to you, instead of having to send you only revision A, and all changesets emerging from that. | Doing that is pretty dumb and I ''think'' that's why people say shallow clones, or lazy fetching, is impossible. If the ONLY revision you can have is A, then you need ''all'' changesets from A to anywhere in order to get... anywhere. Is that not the case? Is there another reason these "shallow clones" don't work well for certain things, like for instance making any contributions or changes at all that others can actually use? If so please educate me because it really puzzles me. |
Line 54: | Line 69: |
If revisions are nodes, changesets form a graph connecting the nodes. | What I want to do is enable the possibility of having a ''different'' initial revision, simply ignoring the requirement that A be the initial revision. The changesets you calculate will still be the same, but you won't need ''all'' changesets in order to do anything useful. Furthermore you don't need the entire "future" of a project to commit and push changes to it. Even being totally ignorant of the future, your changeset would be identical to if you knew of future changesets, checked out an earlier revision, and made changes to it. Ultimately all you need is a revision to work on, and instructions for retrieving or calculating previous and next versions. |
Line 56: | Line 71: |
Given the following repository: * revision A * changeset A->B * changeset B->C * changeset C->D * changeset D->E * changeset E->F * changeset F->G * changeset G->H * changeset H->I |
"Instruction to get the next value in a list" is the essence of lazy stream algorithms, where instead of having the entire list, you only need the front of that list, and the instructions at the end. As they beat into your head in lisp circles, '(a b c d) can be converted to and from '(a . (lambda () '(b c d)). So an infinite list '(a b c d ... forever ...) that cannot be stored in finite computer memory, can be represented by the finite list '(a . (lambda () (calculate-next))). That finite structure can represent an infinite list, because calculate-next returns a tuple of '(b calculate-next-next), and calculate-next-next returns '(c calculate-etc), and so on indefinitely (and recursively, most likely). |
Line 67: | Line 73: |
if I only want to work on the latest revision, the remote server does not need to send me A, A->B, B->C, C->D, D->E, E->F, F->G, G->H and H->I. That could be a lot of changes, and take up bandwidth and disk space. Instead the remote end could calculate B by applying A->B to A. Then apply B->C to ''that'' and so forth, ending up with a revision representing the current most recent version. And then the remote server only has to send I, and ''no'' changesets. | Applying that to revisions and changesets if you had revision I, and nothing previous, and still wanted to go back, you could make a list of revisions '(I . find-rest), and find-rest would go to a remote server and request the changeset that results in whatever is before I. The server would have the changeset H->I, thus knowing that I want I->H, and by calculating both H and I, be able to produce H->I. After it gives me that, my list of revisions would be '(I I->H . find-rest). The order of this list being from present to past. Requesting revisions from past to present works the same way, where I specify my revisions as '(C . find-rest), the server sends me C->D giving me '(C C->D find-rest) and so on until I have H->I. |
Line 69: | Line 75: |
Supposing I make a change I->J, and commit that. Now I've got | I never need an absolute initial revision, because the first find-rest lets me request past revisions as needed, on demand, when I check them out locally. I never need an absolute final version, because if I want to accept someone else's changes I simply use the second find-rest to add on to my local list of changesets. Then I can merge my branch with theirs or even rebase as desired, using normal mercurial functions. |
Line 71: | Line 77: |
* revision I * changeset I->J |
And since I never need an absolute initial revision, I can do work on the latest revision, and not disturb anyone's repository that way, since the changeset I produce based on the latest revision will be identical to the changeset I produce based on the "arbitrary initial revision with all previous changesets applied". |
Line 74: | Line 79: |
If I wanted to publish those changes back to the original repository, doing so would be trivial. Just verifying that they have (or can get) revision I, I send them changeset I->J. They store it. They update tip to J. Simple. And now if anyone fetches they'll get revision J, even though I only sent changeset I->J. | What I mean is this. When the remote peer you are fetching from sends you B->C and B->A, it will either have revision B, or have a way to compute it. Nobody needs to store any changesets for which the parent revision is missing, so if the remote peer can't get B, then it won't have any B-> changesets. When you fetch, the official way is sending you A, A->B, B->C and B->A. No problems there. When the remote just sends you B->C and B->A it doesn't work, and you can't calculate any revision at all. But what the remote ''should'' do is send you B->C, B->A and also B. Sure it's more expensive to also send you B but it is necessary, and it is less expensive than sending you the entire history. |
Line 76: | Line 81: |
Suppose the above server goes down, and two programmers are collaborating directly on the latest version. My I->J change, and their I->K change. Neither of us know the progression it took to reach I. But both of us have revision I, since that's the tip we requested back when the server ''was'' running. And that means by exchanging only our changesets, we can all calculate revision I, J, and K. That's committing, pushing and pulling, without even having access to anything besides shallow copies. If the server comes up again, we can still push our changesets to that server. Nothing would get messed up by doing so. I just send I->J, and they send I->K, and the server now has two new changesets. | The remote can calculate B by applying A->B to A. It knows to calculate B, because you requested to fetch everything from revision B on upward. If you request the latest revision, it can just send you A, and no changesets at all. You would, in your invocation of hg, be able to specify the base revision, and how many revisions after it to fetch, so you could even get the remote to send you revision B, and revision B->C, but not the latest changeset B->A. |
Line 78: | Line 83: |
Supposing a change I make produces a revision identical to an earlier one. I have only revision I, and my change I->J produces revision J which is coincidentally identical to revision B. These letters for each revision are actually a content hash though. When I create a revision, I store it by its content hash, and when I create a changeset, they list content hashes of the parent and child. So I cannot create revision J, if revision J is identical to revision B, without calling them both by the same content hash. When I say I->J, "J" in this case is always and must be identical to "B". I call it I->J but it's <something like I>-><something like B> and when I push I->J to the central server it will consider that patch (rightly) to be I->B. | In the vast majority of cases you would want all future revisions from the base revision you choose, but with for instance extremely ununified projects like the linux kernel (cough), you might not want everyone's idea of "the latest revision" to be your idea of "the latest revision." |
Line 80: | Line 85: |
Thus I don't have to have a copy of revision B, or any changesets leading up to it, to prevent my revision J from being a second revision of the same thing as B with no connections. | ---- The following repository is a bit easier for demonstrating why changesets do not all need to be replicated on all machines for full mercurial functionality: * revision A * changeset A->B * changeset B->C * changeset C->D * changeset D->E * changeset E->F * changeset F->G * changeset G->H * changeset H->I Let's say I only want to work on the latest revision. I ask the server for keyword tip, and all changesets after it (in case any are committed while I request), and it goes and pseudo-checks-out revision I, then sends that to me and nothing else. Now my repository looks like this: * revision I Pretty simple. Now I work on it and create a changeset: * revision I * changeset I->J Now I push back to the server. I send it I->J and now it looks like this: * revision A * changeset A->B * changeset B->C * changeset C->D * changeset D->E * changeset E->F * changeset F->G * changeset G->H * changeset H->I * revision I * changeset I->J Let's rewind to before I pulled anything or made any changes. If instead of only requesting the current revision, I request all of them to the beginning, the server sends everything to me. That gives me this repository: * revision A * changeset A->B * changeset B->C * changeset C->D * changeset D->E * changeset E->F * changeset F->G * changeset G->H * changeset H->I I then check out revision I, by applying all those changesets, then work on I. Now my repository looks like this: * revision A * changeset A->B * changeset B->C * changeset C->D * changeset D->E * changeset E->F * changeset F->G * changeset G->H * changeset H->I * revision I * changeset I->J Now I push to the server, by sending it I->J. Now the server's repository looks like this: * revision A * changeset A->B * changeset B->C * changeset C->D * changeset D->E * changeset E->F * changeset F->G * changeset G->H * changeset H->I * changeset I->J In both scenarios, my changeset I->J is ''exactly identical''. It's the same changes, to the same revision after all! So there is ''no'' need for me to request all changesets back to the "initial" revision, since I can produce the ''exact same changeset'' with only an intermediary calculated revision I. There is no reason whatsoever that the server would reject my I->J changeset, when I requested I instead of "everything," since from its perspective, it ends up in the same state no matter what my repository looks like. (With possibly more calculated revisions hanging around, but those can be deleted when optimizing for space as opposed to speed.) If I only want to work on the latest revision, the remote server does not need to send me A, A->B, B->C, C->D, D->E, E->F, F->G, G->H and H->I. That could be a lot of changes, and take up bandwidth and disk space. Instead the remote end could calculate B by applying A->B to A. Then apply B->C to ''that'' and so forth, ending up with a revision representing the current most recent version: revision I. And then the remote server only has to send I, and ''no'' changesets. |
Lazy Fetching is the idea that you can only pull the revision history that is relevant to your own local changes. This cuts down on bandwidth and disk usage, and makes it easier for new developers. They can simply get a copy of the code, add a patch, and push that patch to others, or have them pull it. Normally you have to request all revisions (or changesets) all the way back to the very first one ("very first" being a relatively arbitrary distinction), before you can commit new ones. It's pretty important that you not be required to do so, especially for very large repositories with long histories.
My concept of a repository is a linked list of revisions, each one composing the contents of the working directory at a particular moment in time. Changesets come in because for small changes, having only the information needed to transition between revisions takes up a lot less space than having both revisions. So for this article at least, a revision will be a full copy of all checked in files, and a changeset will be a diff between two revisions. Let me know if that's horribly wrong or missing some crucial detail.
My idea is that you can have an arbitrary "first revision" that may have previous revisions to it, but you don't need to know that to do useful things with it. You should be able to push and pull changesets to that revision just as if it were an initial revision, even to repositories that treat it as an intermediate revision. These are the scenarios I can think of, with a rough explanation of how the two peers would interact.
Example repository:
- revision A - file foo.txt containing the line "foo"
changeset A->B - starting with revision A, add a line after "foo" containing "bar"
- revision B - file foo.txt containing the lines "foo" and "bar"
changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
- revision C - file foo.txt containing the line "bar"
changeset B->D - starting with revision B, delete the line "foo"
- revision D - file foo.txt containing the line "bar"
changeset B->E - starting with revision B, delete the line "bar"
- revision E - file foo.txt containing the line "foo"
* If you have revision A, and all the changesets, then by applying them one at a time you can get to any of the revisions. * If you have revision A, revision B, and B->C, then by applying B->C to B you can get revision C. * If you have revision A, and A->C, but not anything about revision B, you can still get revision C by applying A->C to revision A. * If you have revisions A and C, you already have revision C, so you could check out either one without applying any changesets. B can't be checked out until you have either A->B or C->B.
It is worthy to note that revision D is exactly the same as revision C, so the two would end up having the same hash, and thus cannot have different names. Same with versions A and E. Effectively, once you attempted to calculate changeset B->current, it would recognize that the working directory hashes to C, and give you B->C, whereas if the hash does not already exist, it would give you B->D, and a new revision D. It is also worth noting that while revision D is identical to revision C, the changesets to reach it, A->C and B->C are not equivalent. Each would be a different diff, with a different hash. Changesets B->E and B->A would be the same, though in that case E would never exist, since you'd just get existing revision A when you tried to create E.
So this is a more accurate summation of the example repository:
- revision A - file foo.txt containing the line "foo"
changeset A->B - starting with revision A, add a line after "foo" containing "bar"
- revision B - file foo.txt containing the lines "foo" and "bar"
changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
- revision C - file foo.txt containing the line "bar"
changeset B->C - starting with revision B, delete the line "foo"
changeset B->A - starting with revision B, delete the line "bar"
Now, where's the initial revision of this repository? It's completely circular! You may as well say A is the root, or B, or C. That's the key to lazy fetching, that there is no initial revision to a repository. If you took the current revision from one repository, and the initial revision from another, you could make the former the parent of the latter, simply by computing the changeset from the first revision, to the last one. Suddenly your "initial revision" is not an initial revision. But despite rebasing in this fashion, the contents of the initial revision itself remain unchanged, so it will have the same hash, and rebase to the same hash. I can arbitrarily say in that repository that A is the initial revision, but any of the revisions could be considered its initial state.
This is why it's important to have lazy fetching in any repository management system. Because what revision is the initial revision is arbitrary, and the concept of repository ignores that all revisions and changesets all fit in a single patch space. There's no reason you couldn't merge two totally different repositories, simply by making changesets between revisions they contain. Arbitrarily denying that functionality because the repositories don't start from the same revision just makes merging projects harder. In effect, every repository can be a branch of every other repository, and the model where it cannot is a flawed model that does not encompass all situations, or even all useful situations.
Since the "initial" revision can always have previous revisions, that makes any log of revisions an unbounded list. And to deal with unbounded lists, you use lazy operations. Thus, lazy fetching is necessary to prevent an arbitrary limitation on repositories that encumbers everyone involved.
If I guess right, mercurial does something like the following:
- revision A - file foo.txt containing the line "foo"
changeset A->B - starting with revision A, add a line after "foo" containing "bar"
changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
changeset B->C - starting with revision B, delete the line "foo"
changeset B->A - starting with revision B, delete the line "bar"
and calls it a history of 4 changesets. No revisions exist beyond the initial one, and calculating a working tree is applying those changesets to reach a particular revision. It might also have backwards changesets, so A<-B (B->A) instead of A->B.
- revision A - file foo.txt containing the line "foo"
changeset A->B - starting with revision A, add a line after "foo" containing "bar"
changeset A<-B - starting with revision B, delete the line "bar"
changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
changeset A<-C - starting with revision C, delete the line "bar" and add the line "foo"
changeset B->C - starting with revision B, delete the line "foo"
changeset B<-C - starting with revision C, add the line "foo"
changeset B->A - starting with revision B, delete the line "bar"
changeset B<-A - starting with revision A, add a line after "foo" containing "bar"
That way the one single revision you keep could be A, B, or C, since you have changesets that move from any one revision to any other. So you could keep revision C for instance, and check it out immediately instead of having to apply the changesets to reach it from A. Obviously there are equivalent changesets, but I don't know if mercurial optimizes those out.
So, taking the above pseudo-mercurial repository, if you just chop off the former part and have
changeset B->C - starting with revision B, delete the line "foo"
changeset B->A - starting with revision B, delete the line "bar"
...then you can't compute any sort of working directory at all. You don't have B, which is needed to apply any B-> changesets. And you can't get B since for instance you have no revision A, and A->B, or C and C->B.
Doing that is pretty dumb and I think that's why people say shallow clones, or lazy fetching, is impossible. If the ONLY revision you can have is A, then you need all changesets from A to anywhere in order to get... anywhere. Is that not the case? Is there another reason these "shallow clones" don't work well for certain things, like for instance making any contributions or changes at all that others can actually use? If so please educate me because it really puzzles me.
What I want to do is enable the possibility of having a different initial revision, simply ignoring the requirement that A be the initial revision. The changesets you calculate will still be the same, but you won't need all changesets in order to do anything useful. Furthermore you don't need the entire "future" of a project to commit and push changes to it. Even being totally ignorant of the future, your changeset would be identical to if you knew of future changesets, checked out an earlier revision, and made changes to it. Ultimately all you need is a revision to work on, and instructions for retrieving or calculating previous and next versions.
"Instruction to get the next value in a list" is the essence of lazy stream algorithms, where instead of having the entire list, you only need the front of that list, and the instructions at the end. As they beat into your head in lisp circles, '(a b c d) can be converted to and from '(a . (lambda () '(b c d)). So an infinite list '(a b c d ... forever ...) that cannot be stored in finite computer memory, can be represented by the finite list '(a . (lambda () (calculate-next))). That finite structure can represent an infinite list, because calculate-next returns a tuple of '(b calculate-next-next), and calculate-next-next returns '(c calculate-etc), and so on indefinitely (and recursively, most likely).
Applying that to revisions and changesets if you had revision I, and nothing previous, and still wanted to go back, you could make a list of revisions '(I . find-rest), and find-rest would go to a remote server and request the changeset that results in whatever is before I. The server would have the changeset H->I, thus knowing that I want I->H, and by calculating both H and I, be able to produce H->I. After it gives me that, my list of revisions would be '(I I->H . find-rest). The order of this list being from present to past. Requesting revisions from past to present works the same way, where I specify my revisions as '(C . find-rest), the server sends me C->D giving me '(C C->D find-rest) and so on until I have H->I.
I never need an absolute initial revision, because the first find-rest lets me request past revisions as needed, on demand, when I check them out locally. I never need an absolute final version, because if I want to accept someone else's changes I simply use the second find-rest to add on to my local list of changesets. Then I can merge my branch with theirs or even rebase as desired, using normal mercurial functions.
And since I never need an absolute initial revision, I can do work on the latest revision, and not disturb anyone's repository that way, since the changeset I produce based on the latest revision will be identical to the changeset I produce based on the "arbitrary initial revision with all previous changesets applied".
What I mean is this. When the remote peer you are fetching from sends you B->C and B->A, it will either have revision B, or have a way to compute it. Nobody needs to store any changesets for which the parent revision is missing, so if the remote peer can't get B, then it won't have any B-> changesets. When you fetch, the official way is sending you A, A->B, B->C and B->A. No problems there. When the remote just sends you B->C and B->A it doesn't work, and you can't calculate any revision at all. But what the remote should do is send you B->C, B->A and also B. Sure it's more expensive to also send you B but it is necessary, and it is less expensive than sending you the entire history.
The remote can calculate B by applying A->B to A. It knows to calculate B, because you requested to fetch everything from revision B on upward. If you request the latest revision, it can just send you A, and no changesets at all. You would, in your invocation of hg, be able to specify the base revision, and how many revisions after it to fetch, so you could even get the remote to send you revision B, and revision B->C, but not the latest changeset B->A.
In the vast majority of cases you would want all future revisions from the base revision you choose, but with for instance extremely ununified projects like the linux kernel (cough), you might not want everyone's idea of "the latest revision" to be your idea of "the latest revision."
The following repository is a bit easier for demonstrating why changesets do not all need to be replicated on all machines for full mercurial functionality:
- revision A
changeset A->B
changeset B->C
changeset C->D
changeset D->E
changeset E->F
changeset F->G
changeset G->H
changeset H->I
Let's say I only want to work on the latest revision. I ask the server for keyword tip, and all changesets after it (in case any are committed while I request), and it goes and pseudo-checks-out revision I, then sends that to me and nothing else. Now my repository looks like this:
- revision I
Pretty simple. Now I work on it and create a changeset:
- revision I
changeset I->J
Now I push back to the server. I send it I->J and now it looks like this:
- revision A
changeset A->B
changeset B->C
changeset C->D
changeset D->E
changeset E->F
changeset F->G
changeset G->H
changeset H->I
- revision I
changeset I->J
Let's rewind to before I pulled anything or made any changes. If instead of only requesting the current revision, I request all of them to the beginning, the server sends everything to me. That gives me this repository:
- revision A
changeset A->B
changeset B->C
changeset C->D
changeset D->E
changeset E->F
changeset F->G
changeset G->H
changeset H->I
I then check out revision I, by applying all those changesets, then work on I. Now my repository looks like this:
- revision A
changeset A->B
changeset B->C
changeset C->D
changeset D->E
changeset E->F
changeset F->G
changeset G->H
changeset H->I
- revision I
changeset I->J
Now I push to the server, by sending it I->J. Now the server's repository looks like this:
- revision A
changeset A->B
changeset B->C
changeset C->D
changeset D->E
changeset E->F
changeset F->G
changeset G->H
changeset H->I
changeset I->J
In both scenarios, my changeset I->J is exactly identical. It's the same changes, to the same revision after all! So there is no need for me to request all changesets back to the "initial" revision, since I can produce the exact same changeset with only an intermediary calculated revision I. There is no reason whatsoever that the server would reject my I->J changeset, when I requested I instead of "everything," since from its perspective, it ends up in the same state no matter what my repository looks like. (With possibly more calculated revisions hanging around, but those can be deleted when optimizing for space as opposed to speed.) If I only want to work on the latest revision, the remote server does not need to send me A, A->B, B->C, C->D, D->E, E->F, F->G, G->H and H->I. That could be a lot of changes, and take up bandwidth and disk space. Instead the remote end could calculate B by applying A->B to A. Then apply B->C to that and so forth, ending up with a revision representing the current most recent version: revision I. And then the remote server only has to send I, and no changesets.