Proposed new wire protocol
The wire protocol has several flaws:
- it uses the roots of the new branches, this is susceptible to a race (see issue1320) .
- if the client is missing a lot of nodes but doesn't have any local changes, it will still have to do a lot of roundtrips to discover the base nodes.
See WireProtocol for the current protocol.
Overview
The basic algorithm is still the same, we first discover a set of nodes. Then we get all the changes from those nodes.
The Protocol
The discovery protocol is based on the [http://en.wikipedia.org/wiki/Dominator_(node) dominance property]
heads()
return a list of heads (everything new must be an ancestor of one of these heads, so start here)
Basically dominators() is like between() except it is more coarse-grained
dominators(list of couple)
- list = a list of couple (tip, dom) where dom dominates tip
- tip = the last revision in a branch that we're interested in (often a head).
- base = the first revision going backwards from a node that has two parents (the product of a merge).
- p1 = the first parent of the base.
- p2 = the second parent of the base.
let s be the sequence of dominators from tip (we only aggregate linear history, so it's a list of (t, b, p1, p2)) return elements 1, 2, 4, ... (maybe bounded ?)
between(list)
branch = Basically a linear set of changes. More formally, a list of changes such that no change in the set (except possibly the last) has two parents, and all changes in the list are related to eachother by parent->child relationships.
- list = a list of (tip, base) tuples for branches we'd like to know the history of. Presumably the client knows about the base of the branch, but not the tip, and wants to find out where in the branch its knowledge ends.
- tip = the latest revision in a branch the client is interested in (may not be the actual tip of the branch on the server side)
- base = the earliest revision in a branch the client is interested in. Always a revision the client knows about.
for tip, base in list: walk back a linear branch, return elements 1, 2, 4, 8.. (and this lets us do bisection search if we diverge in the middle of one of these long branches)
Now it would be better if the processus was symetrical, at the same time as we descend from remote tips, we could do the same locally. For that we would need the following call: has(list)
- list = a list of nodes
return a list of boolean indicating if the node was found on the server.
To minimize roundtrips, it would be nice to combine the has() call with dominators().
The current changegroup() uses base nodes, it should instead use common nodes. changegroup(roots)
- roots = a list of the latest nodes on every service side changeset branch that both the client and server know about.
find all changesets descended from roots and return them as a single changegroup
A changegroup is a single stream containing:
- a changelog group
- a manifest group
- a list of
- filename length
- filename
- file group (terminated by a zero length filename)
A group is a list of chunks:
- chunk length
- self hash, p1 hash, p2 hash, link hash
- uncompressed delta to p1
- (terminated by a zero length chunk)
Wishlist
- Estimate early how much data or items have to be transfered and communicate this to the other side, so a progress indicator could be more useful.