Changesets Evolution - Core Concept
List of important concept or sub-problem of changeset Evolution. (not a full list, new item likely to be gathered here in the future)
Contents
1. Changeset Evolution Goal
The founding goal of changeset evolution is to allow users to exchange and even collaborate on part of the history that are still mutable (draft changesets).
This an ambitious goal tackling a very complex issue. Testings show changeset evolution is able to solve it.
So far, no other DVCS has fully solved this issues. Completing changeset evolution to a point is can be enabled to all users will give Mercurial a significant edge over other solutions.
As a side effect of solving many complex problem inherent to the distributed case, changeset evolution also provide good tools for local workflow. These are nice benefit that are worth exploring. However local use-cases cannot divert us from the core-goal here, the distributed case.
2. Design principle
There is a small list of guideline used for evolution:
- Human are error prone, we should aims are doing the right things automatically as much as possible,
- Duplicated changesets can leads to confusion and lose of important code changes. Having only one version of each changeset alive and visible is a goal,
- preventing user to do a bad action is an order of magnitude simpler than fixing the consequence of that bad action,
- evolution solve complex problem, yet, we should avoid exposing the users to more complexity than he needs for its usage of the tools.
- tool behavior cannot surprise or abandon user in case simple workflow can reach, they would stop trusting the tool.
3. Obsolescence markers semantic
Obsolescence markers are the individual units that record the history of the evolution of changesets. This history is global and recorded in a distributed way.
3.1. obs-marker: General Principle
Creating an obs-markers between OLD and NEW, is recording that OLD has a newer and better version NEW.
In a similar way, prune markers (OLD with no NEW) globally records that the changes in OLD are unwanted and should not be pursued.
3.2. Commands creating markers
Since they are a strong semantic associated to markers. They cannot be created lightly. We have a set of commands that records the right markers for the right operation.
- hg amend: changeset is rewritten into another,
- hg fold: multiple changesets are merged into a single one,
- hg split: changeset is split into multiple ones,
- hg prune: changeset is abandoned
- hg rebase: move a changeset on another parents,
- hg rewind (to be introduced, currently 'touch'): go back to an older version
4. Global State
All effects from evolution are computed from a global state built and retrieved in a distributed manners.
This is critical property as it ensures the user that the local effect are the effect applied remotely.
4.1. global-state: General Principle
The obsolescence markers are exchanged between repositories alongside the markers they are relevant to, each repositories slowly gathering more information about that global state. Key elements of this global state are:
- phases,
- changesets,
- obsolescence markers,
from this data we can apply this simple set of rules:
- public changesets are immutable,
- mutable changeset affected by an obsolescence-marker are obsolete,
- obsolete changesets without visible descendants are hidden,
- unstable state: orphan and divergence (former troubles) are computed.
4.2. Order Independent
It is critical that the state build from sets of different information (changesets, phases, obsmarkers) is universal and stable:
- Cloning one repository should provides you with the same view as the source repository.
- different user should be able to pull from each others in any order and discuss over a consistent state.
This allows people to simply and quickly reach a common state they can discuss and collaborate from.
4.2.1. concrete examples
Here is a practical basic example:
you have three users: Alice, Bob and Celestine,
First, Alice creates a draft branch with three changesets A, B, C
Alice: C ○ | B ○ | A ○ | ●
Then, Bob get the draft branch and rework it as D, E, F, creating markers in the process
Bob: C ø⇠○ F | | B ø⇠○ E | | A ø⇠○ D |/ ●
The order in which Celestine pulls from Alice and Bob does not matters, it both results in the same final global states.
Pull From Alice C ○ | B ○ | A ○ | ● Then Bob C ø⇠○ F | | B ø⇠○ E | | A ø⇠○ D |/ ●
or
Pull From Bob ◌⇠○ F | ◌⇠○ E | ◌⇠○ D / ● Then Alice C ø⇠○ F | | B ø⇠○ E | | A ø⇠○ D |/ ●
5. Monotonic rewrite
Current mercurial implementation operate in a way where rewriting operation will always create a new node.
This result in the creation of a clear and non-ambiguous obsolescence history, simplifying many issues related to changesets evolution.
A user facing version of this is:
Mercurial precisely tracks all rewrite operations, to achieve this all operations creates a new node.
5.1. implementation details
(note: this behavior predates changeset-evolution, it just happens to help it)
This is usually achieved by "salting" the new node with the hash of the old one.
Taking a simple example:
- commit a changeset
- amend a new file A in the changeset
- amend a new file B in the changeset
The hash of the node in (1) is stored somewhere in (2) extra affecting its hash. Then again the hash of (2) is affecting the hash of (3).
If instead, step (2) and (3) are combined, we skipped the creation of node from (2) and even if the content is the same, the hash is different.
This also apply if you go back to a previous content.
- commit a changeset,
- amend addition of file A,
- amend removal of file A,
Even if content of (1) and (3) are the same, they hash will differ.
5.1.1. Bring Back Obsolete Changeset
As an extension of this principle, bringing an obsolete changeset back to life is done by creating a new node. It keeps the same content with just a slight touch to user-hidden metadata in order to change the hash.
5.2. Property
- series of rewrite will not loop back to the same node
- different sequence of operation will produces different hashes regardless of
- content equality,
5.3. monotonic obsolescence
The above properties translates to:
- series of rewrite will not create linear stream of obs-markers (no cycle),
- stream of obs-markers created by different series of rewrite will not cross each other,
The two properties of the obsolescence-graph will help a lot in using that graph in a meaningful way.
- once obsoleted, a node will stay so, they are no need for un-obsolete capability of a specify node.
This "happen" only property is very useful in distributed system and cache building.
5.4. Problem this Easily Solve
(This is not a complete list)
5.4.1. Finding Branch Replacement During Push
While pushing, Mercurial checks the push is not adding new heads to the destination repository (checkheads).
However, we could be pushing a new branches that supersede (obsolete) an existing one on the server. We use obsolescence data for this.
Here is a simple example:
A is known on the remote, we push A'
A ◕⇠◔ A' |/ ●
We detect that pushing A' will include an obs-markers affecting A. We know this obs-markers will make A hidden so the push can proceed.
Being strictly monotonic helps a lot here, because we know having an obsolescence markers affecting the node will makes this node obsolete.
Here is another example were a changeset A is "brought back" as A" while another repository created an amended version of it:
A A' ø⇠ø⇠◔ A"' |/⇡ | | ◕ / A" |/ / | / |/ ●
In this case having A" as a different node lift the confusion than reusing A hash would creates.
5.4.2. easy caching
The state of nodes in regard with obs-markers can change in one direction only: not-affected → affected.
This makes caching of this property much simpler. as we can just flips one simple bits *in place* when new markers or changesets arrives.
"In place" update makes the transaction handling much simpler. Some of the benefit also extend to cache involved in obs-markers discovery.
5.4.3. Rare and Clear Handling of Cycle
While it is easy to prevent obs-cycle locally. The distributed nature of evolution still makes them possible:
If two users Alice and Bob creates similar changesets A and B and end up exchanging them. They can each decides to mark their version obsoleted by the other ( A ← B; B ← A). This will happen in a distributed way, in their own repository without communication between the two at the time it happens. The cycle will only appears when both data end up in the same place through exchange.
The case described above is an unambiguous case of conflicting-"rewrite", what we call "divergence". We need to detect it, and ask the user for a resolution.
Not creating local cycles means that detecting and processing of obs-cycle will only happens for this "divergence" case. A great simplification since we do no have to determine the cycle is intentional or not.
5.4.4. Latest Successors and unstable (former troubles) Detection
In the vast majority of case a series of rewrite will creates a have simple stream of obs-markers without loops, or crossing. This makes many computation important to evolution simpler.
- Latest successors: follow the stream until the ends.
- divergence: if the stream forks, we have divergence,
- etc…
5.4.5. Marker Exchanges
In a similar way. Having simple stream of markers helps with the computation of relevant markers during push and pull. Having loops or crossing would increase ambiguity. CEDObsmarkersExchange#Exchanged_markers_changes
5.5. User Impact
In the general case, obtaining a different nodeid through different amend path is harmless. The hash identify a changeset whatever the way it is computed. User are not going to compute sha1 hash by hand to resolve changesets.
The impact in the "bring-brack" changeset case is more visible since the "same" changeset comes back with a different hash. Fully loosing the reference to the previous hash would erodes the ability to refer to a specific changeset with its hash over time. To prevent that, we can record the identity between the old and the new hash, and forward user request to the old hash toward the newer, non-obsolete hash.
This forwarding is not as nice has having hash-preserved when the changeset is revived. However, monotonic rewrite has so many other useful property that the trade-off is worth it. In addition, user testing has not raised hash change as a significant concerns.
6. Unstable state
(previously called Troubles)
this section is a stub, is should explain the case and they solution more in depth.
Rewriting changesets (exchanged with other repos) might introduce instability.
There are two main kinds of instability: orphaning and diverging.
Orphans are changesets left behind when their ancestors are rewritten, Divergence has some variants:
- Content-divergence occurs when independent rewrites of the same changesets lead to different results,
- Phase-divergence occurs when the old (obsolete) version of a changeset becomes public,
- Cycle-divergence occurs when distributed action makes it impossible to setermine the latest version of a changeset because its successors are also recorded as its predecessors.