Diff for "cy/LazyFetch"

Differences between revisions 1 and 2

Lazy Fetching is the idea that you can only pull the revision history that is relevant to your own local changes. This cuts down on bandwidth and disk usage, and makes it easier for new developers. They can simply get a copy of the code, add a patch, and push that patch to others, or have them pull it. Normally you have to request all revisions (or changesets) all the way back to the very first one ("very first" being a relatively arbitrary distinction), before you can commit new ones. It's pretty important that you not be required to do so, especially for very large repositories with long histories.

My concept of a repository is a linked list of revisions, each one composing the contents of the working directory at a particular moment in time. Changesets come in because for small changes, having only the information needed to transition between revisions takes up a lot less space than having both revisions. So for this article at least, a revision will be a full copy of all checked in files, and a changeset will be a diff between two revisions.

Example repository:

revision A - file foo.txt containing the line "foo"
changeset A->B - starting with revision A, add a line after "foo" containing "bar"
revision B - file foo.txt containing the lines "foo" and "bar"
changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
revision C - file foo.txt containing the line "bar"
changeset B->D - starting with revision B, delete the line "foo"
revision D - file foo.txt containing the line "bar"
changeset B->E - starting with revision B, delete the line "bar"
revision E - file foo.txt containing the line "foo"

If you have revision A, and all the changesets, then by applying them one at a time you can get to any of the revisions.

If you have revision A, revision B, and B->C, then by applying B->C to B you can get revision C.

If you have revision A, and A->C, but not anything about revision B, you can still get revision C by applying A->C to revision A.

If you have revisions A and C, you already have revision C, so you could check out either one without applying any changesets. B can't be checked out until you have either A->B or C->B.

It is worthy to note that revision D is exactly the same as revision C, so the two would end up having the same hash. Same with versions A and E. Effectively, once you attempted to calculate changeset B->current, it would recognize that the working directory hashes to C, and give you B->C, whereas if the hash does not already exist, it would give you B->D, and a new revision D. It is also worth noting that the revision D is identical to the revision C, but the changesets to reach it, A->C and B->C are not equivalent. Each would be a different diff, with a different hash. Changesets B->E and B->A would be the same, though in that case E would never exist, since you'd just use existing revision A.

So this is a more accurate summation of the example repository:

revision A - file foo.txt containing the line "foo"
changeset A->B - starting with revision A, add a line after "foo" containing "bar"
revision B - file foo.txt containing the lines "foo" and "bar"
changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
revision C - file foo.txt containing the line "bar"
changeset B->C - starting with revision B, delete the line "foo"
changeset B->A - starting with revision B, delete the line "bar"

-  ⇤ ← Revision 1 as of 2014-08-20 08:20:09 → 
  Size: 3338
  Editor: cy
  Comment:
+   ← Revision 2 as of 2014-08-20 08:22:35 → ⇥
  Size: 3519
  Editor: cy
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 6:
-revision A - file foo.txt containing the line "foo"
changeset A->B - starting with revision A, add a line after "foo" containing "bar"
revision B - file foo.txt containing the lines "foo" and "bar"
changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
revision C - file foo.txt containing the line "bar"
changeset B->D - starting with revision B, delete the line "foo"
revision D - file foo.txt containing the line "bar"
changeset B->E - starting with revision B, delete the line "bar"
revision E - file foo.txt containing the line "foo"
+ * revision A - file foo.txt containing the line "foo"
 * changeset A->B - starting with revision A, add a line after "foo" containing "bar"
 * revision B - file foo.txt containing the lines "foo" and "bar"
 * changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
 * revision C - file foo.txt containing the line "bar"
 * changeset B->D - starting with revision B, delete the line "foo"
 * revision D - file foo.txt containing the line "bar"
 * changeset B->E - starting with revision B, delete the line "bar"
 * revision E - file foo.txt containing the line "foo"
 Line 17:
-Line 18:
+Line 19:
-Line 19:
+Line 21:
-Line 21:
+Line 24:
-It is worthy to note that revision D is exactly the same as revision C, so the two would end up having the same hash. Same with versions A and E. Effectively, once you attempted to calculate changeset B->current, it would recognize that the working directory hashes to C, and give you B->C, whereas if the hash does not already exist, it would give you B->D, and a new revision D. It is also worth noting that the revision D is identical to the revision C, but the changesets to reach it, A->C and B->C are ''not'' equivalent. Each would be a different diff, with a different hash.
+It is worthy to note that revision D is exactly the same as revision C, so the two would end up having the same hash. Same with versions A and E. Effectively, once you attempted to calculate changeset B->current, it would recognize that the working directory hashes to C, and give you B->C, whereas if the hash does not already exist, it would give you B->D, and a new revision D. It is also worth noting that the revision D is identical to the revision C, but the changesets to reach it, A->C and B->C are ''not'' equivalent. Each would be a different diff, with a different hash. Changesets B->E and B->A would be the same, though in that case E would never exist, since you'd just use existing revision A.
-Line 24:
+Line 27:
-revision A - file foo.txt containing the line "foo"
changeset A->B - starting with revision A, add a line after "foo" containing "bar"
revision B - file foo.txt containing the lines "foo" and "bar"
changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
revision C - file foo.txt containing the line "bar"
changeset B->C - starting with revision B, delete the line "foo"
changeset B->A - starting with revision B, delete the line "bar"
+ * revision A - file foo.txt containing the line "foo"
 * changeset A->B - starting with revision A, add a line after "foo" containing "bar"
 * revision B - file foo.txt containing the lines "foo" and "bar"
 * changeset A->C - starting with revision A, delete the line "foo" and add the line "bar"
 * revision C - file foo.txt containing the line "bar"
 * changeset B->C - starting with revision B, delete the line "foo"
 * changeset B->A - starting with revision B, delete the line "bar"