<> <> ## page was renamed from WireProtocolNG Status: Draft/Experiment == Proposed new discovery/changegroup protocol == The wire protocol has several flaws: * it uses the roots of the new branches, this is susceptible to a race (see issue1320) . * if the client is missing a lot of nodes but doesn't have any local changes, it will still have to do a lot of round trips to discover the base nodes. See WireProtocol for the current protocol. === Overview === === The Protocol === ==== Discovery ==== Instead of exploring the missing remote nodes like the current protocol, this proposal explores the local nodes, exploiting more local knowledge and having a stateful approach (which can't be done on the server). It only needs one new server call: {{{discover(nodes)}}} * list of local nodes {{{ return a boolean array, value is True if the server has the node, False otherwise. (TODO: investigate what is cheaper, between list and boolean array / to avoid races, should we pass the head as argument?) }}} The client keeps all it's nodes between three states: unknown, missing and common. At each iteration, it builds a sample of the nodes in unknown state, send it to discover and update the sets accordingly. The sample size should obviously be bounded. Proposed sample construction: first breadth first search, starting from the nodes at {{{headsof(unknown)}}}, keep nodes at a distance 0, 1, 2, 4, 8, 16, ... from the heads. Second breadth first search starting at {{{rootsof(unknown)}}}, keep the nodes at a distance 0, 1, 2, 4, 8, 16, ... from the roots. If the sample is bigger than MAX_SAMPLE_SIZE, take a random sample from it, but make sure it includes the heads. If the number of unknown nodes is smaller than MAX_SAMPLE_SIZE, send all the unknown nodes to discover. During testing in largish repository (NetBeans, Linux Kernel), the number of iteration was almost always lower than 5. Proposed tweak: the server can compute it's own unknown set and return a sample on it's own. Testing showed that it didn't make a big difference in the number of iterations (it usually removes at most one round trip) ==== Changegroup ==== The current changegroup() uses base nodes, it should instead use common nodes. {{{changegroup(roots, heads)}}} * roots = list of nodes, that the server can assume the client knows (as well as all their ancestors). {{{ find all changesets ancestors from heads and not descended from roots and return them as a single changegroup }}} A changegroup is a single stream containing: * a changelog group * a manifest group * a list of * filename length * filename * file group (terminated by a zero length filename) A group is a list of chunks: * chunk length * self hash, p1 hash, p2 hash, link hash * uncompressed delta to p1 (or optionally to the previous node) * (terminated by a zero length chunk) * where should lightweight copies be put? === Prototypes === Prototypes can be found here: * http://bitbucket.org/parren/dag-discovery/ * http://bitbucket.org/bboissin/dag-discovery/ (fork) === Wishlist === * Estimate early how much data or items have to be transfered and communicate this to the other side, so a progress indicator could be more useful. -- ThomasArendsenHein <> ---- CategoryInternals