Note:
This page appears to contain material that is no longer relevant. Please help improve this page by updating its content.
This page does not meet our wiki style guidelines. Please help improve this page by cleaning up its formatting. |
Status: Draft/Experiment
Proposed new discovery/changegroup protocol
The wire protocol has several flaws:
- it uses the roots of the new branches, this is susceptible to a race (see issue1320) .
- if the client is missing a lot of nodes but doesn't have any local changes, it will still have to do a lot of round trips to discover the base nodes.
See WireProtocol for the current protocol.
Overview
The Protocol
Discovery
Instead of exploring the missing remote nodes like the current protocol, this proposal explores the local nodes, exploiting more local knowledge and having a stateful approach (which can't be done on the server).
It only needs one new server call:
discover(nodes)
- list of local nodes
return a boolean array, value is True if the server has the node, False otherwise. (TODO: investigate what is cheaper, between list and boolean array / to avoid races, should we pass the head as argument?)
The client keeps all it's nodes between three states: unknown, missing and common. At each iteration, it builds a sample of the nodes in unknown state, send it to discover and update the sets accordingly. The sample size should obviously be bounded.
Proposed sample construction: first breadth first search, starting from the nodes at headsof(unknown), keep nodes at a distance 0, 1, 2, 4, 8, 16, ... from the heads. Second breadth first search starting at rootsof(unknown), keep the nodes at a distance 0, 1, 2, 4, 8, 16, ... from the roots. If the sample is bigger than MAX_SAMPLE_SIZE, take a random sample from it, but make sure it includes the heads. If the number of unknown nodes is smaller than MAX_SAMPLE_SIZE, send all the unknown nodes to discover.
During testing in largish repository (NetBeans, Linux Kernel), the number of iteration was almost always lower than 5.
Proposed tweak: the server can compute it's own unknown set and return a sample on it's own. Testing showed that it didn't make a big difference in the number of iterations (it usually removes at most one round trip)
Changegroup
The current changegroup() uses base nodes, it should instead use common nodes.
changegroup(roots, heads)
- roots = list of nodes, that the server can assume the client knows (as well as all their ancestors).
find all changesets ancestors from heads and not descended from roots and return them as a single changegroup
A changegroup is a single stream containing:
- a changelog group
- a manifest group
- a list of
- filename length
- filename
- file group (terminated by a zero length filename)
A group is a list of chunks:
- chunk length
- self hash, p1 hash, p2 hash, link hash
- uncompressed delta to p1 (or optionally to the previous node)
- (terminated by a zero length chunk)
- where should lightweight copies be put?
Prototypes
Prototypes can be found here:
Wishlist
- Estimate early how much data or items have to be transfered and communicate this to the other side, so a progress indicator could be more useful.
-- ThomasArendsenHein 2008-10-24 14:23:01