Differences between revisions 1 and 22 (spanning 21 versions)
Revision 1 as of 2013-02-09 10:55:53
Size: 2685
Comment: moving older ChangeEvolution content here
Revision 22 as of 2016-02-19 16:15:16
Size: 10879
Comment: some more details on current plan.
Deletions are marked like this. Additions are marked like this.
Line 1: Line 1:

=
Implementation Details about Changesets Evolution =
#pragma section-numbers 2
=
Changesets Evolution - development page. =
Line 6: Line 5:
Changesets evolution allows for safe rewriting of Mercurial history. This has a close relationship with Phases

    Presentation of the concept: http://hg-lab.logilab.org/doc/mutable-history/html/

    Related experimental extension (usable): http://bitbucket.org/marmoute/mutable-history/overview

== Core principle ==

 * Store an explicit obsolescence marker between new and old version of rewritten changeset.
 * This marker is *not* part of the changeset (should not alter the hash).
 * People are able to collaborate on evolving changeset

== Additional ideas ==

 * Store final delta in a real and autonomous changeset.
 * The Obsolescence markers are exchangeable without rewritten changeset.
 * Easily allow other extension to manipulate such relation (and to hook on such operation)

== Handled situations ==

 * Rewriting content of a changeset
 * Deleting/killing a changeset
 * Splitting a single changeset into multiple ones
 * Collapsing/folding multiple changeset into a single one
 * Changing changeset order
 * Adding (e.g., pulling) a changeset evolution that conflicts with another one
 * Adding (or adding in general) new changesets on one which already evolved (or evolving a changeset that have descendants)

== Changeset Obsolescence ==

Obsolescence markers make it possible to mark changesets that have been deleted or superseded in a new version of the changeset.

Unlike the previous way of handling such changes, by stripping the old changesets from the repository, obsolescence markers can be propagated between repositories. This allows for a safe and simple way of exchanging mutable history and altering it after the fact. Changeset phases are respected, such that only draft and secret changesets can be altered (see hg phases for details).

Obsolescence is tracked using "obsolescence markers", a piece of metadata that tracks which changesets have been made obsolete, potential successors for a given changeset, the moment the changeset was marked as obsolete, and the user who performed the rewriting operation. The markers are stored separately from standard changeset data can be exchanged without any of the precursor changesets, preventing unnecessary exchange of obsolescence data.

The complete set of obsolescence markers describes a history of changeset modifications that is orthogonal to the repository history of file modifications. This changeset history allows for detection and automatic resolution of edge cases arising from multiple users rewriting the same part of history concurrently.
<<TableOfContents>>

For a user perspective have a look at the ChangesetEvolution page.

== Contributing ==
The simplest way to help is to grab one of: [[http://bz.selenic.com/buglist.cgi?keywords=easy,%20&keywords_type=anywords&order=Bug%20Number&resolution=---&query_format=advanced&bug_status=UNCONFIRMED&bug_status=CONFIRMED&bug_status=NEED_EXAMPLE&bug_status=IN_PROGRESS&component=evolution&list_id=5014|list of easy bug]]

There is also a multiple well defined topic that where idea exist but needs an implementation

 * Rebase could make more use of obsolescence marker:
  * detection that part of the rebase set is already in the destination
  * warning about divergence creation.

 * [[ObsolescenceMarkersExchange|Obsolescence markers exchange]] (no really we have idea waiting to be implemented)

 * Prototype for a command bringing changeset back to life.

There is more complicated part that requires attention too.

 * Better storage//loading//cache for markers ''(depends on marker exchange)''

 * Solving the problem of [[TopicPlan|Topic Branch]]

 * Handling moving changesets around with the "plan" concept,

 * in memory merge ''(helps a lot of troubles resolution throught evolve)''

 * Wide transaction capability
  * Continue/stop/abort for all command including evolve
  * In transaction content non exchangeable.

 * Making evolve capable of solving all troubles that user can encounter

 * Computing UI message about troubles from event who happened during the transaction.

== Roadmap ==
Current status:

 * (./) Alpha Stage,
 * {X} Beta Stage,
 * {X} Release Stage,

=== Stage A ===
Changeset Evolution is currently at Alpha Stage. If won't eat people data, but only handful of people knowns how to get out some situation..

=== Stage B ===

All the "core" features should be here and somewhat work. Some will be sluggish
and unpleasant to use as the focus is not yet on user interface.

We can gather the base-line expectations for this stage in 4 groups.

== Troubles Resolution ==

Users need to know that he can trust the tool to offer him what it needs and
bring him out of trouble. We have already cover a good share of that work with a
descent `hg evolve` command handle most common case and a large set of small
commands to each operation. Yet there is some area where the user is still left
with no "simple" option.


 * (./) Evolve should have predictable result (--rev options and co),

 * Evolve should be able to somewhat handle all possibles troubles (including
   divergence and bumping)

 * Evolve should have a working --abort and --continue (and --drop?)

 * We must have some command to "bring obsolete changeset" back. (issue4851)

== Displaying Evolution History to the User ==

Now that we (almost) have all the brick to build a clean evolution history,
there is a lot of way we could easily expose it to the user. This is critical to
ensure the user have some awareness of the feature and is able to understand
what is happening in case of troubles.

 * Having "hg evolve --list" displaying troubles changesets, trouble affecting
   them and details of the diagnostic

 * Basic display of precursors/successors available in `hg log`

 * Some highlight of obsolete//unstable node in default log (color?)

== Troubles Prevention ==

There is multiple cases were we know that an action from the user will create
troubles (most notably divergence) preventing the user to do so in the first
place (in the same philosophy) as phase would be very useful,

 * Having rebase skip changesets with successors in destination,

 * Having a generic mechanism to run "validation" before any "editing" of a
   changesets,

 * requires a special flag or config to create divergence locally.

== Low level Utilities ==

Currently, the "obsstore" is not really "fixable" having tool available to very
advanced users would be good.

 * "hg strip" should have a way to strip related obsolescence markers,

 * "hg strip" should have a way to strip a whole obsolescence Lineage.

 * Some debug command to strip arbitraty markers?

=== Stage C ===
At this "beta" state, the UI and experience will not be easy/pleasant enough for normal user, but advance user of Mercurial should find their mark. We may consider shipping it with Core Mercurial with an experimental flag.

Blocker to beta release:

 * Obsmarkers exchange should be good enough to define a final storage format for markers.

 * On disk format should be stable in the foreseeable future and fit performance//exchange need.

 * Evolve should be able to solve (or provide a way to) all troubles that a user may encounter, especially divergence,

 * Evolve should have predictable result (--rev options and co),

 * Evolve should be abortable (wide transaction ?),

 * Performance impact should be ''reasonable'',

=== Release Stage ===
A which point we can merge changesets evolution into core.

 * UI offering a Solution to the N² markers creation when editing history (TopicPlan),

 * Commands set defined enough to be freezed for backward compatibility

 * No race condition when exchanging with server ''(bundle2 + repo layout allowing atomic transaction)'',

 * No impactful Performance Regression (including efficient exchange),

 * Concrete plan to handle high volume of markers (archiving or something),

=== Related Concept ===

Other concept not directly involved in Changeset Evolution and closely related
for technical or user experience reason.

 * AtomicRepositoryLayoutPlan: To reduce race condition around new evolve
 * metadata
 * WideTransactionPlan: To help ability to have "pure" abort/rollback/undo,
 * FeatureBranchesStruggle: To help organise all your developpements branches,
 * TopicPlan: A possible solution to the above,
 * RevsetOperatorPlan: To offer proper way to travel through the
 * evolution-history,
 * PlanHistoryRewritePlan: Factorise and empower all the common history rewriting operation,
 * ExperimentalExtensionsPlan: To move the evolve extension into core,
 * CommitCustodyConcept: Top level way to follow and sign draft changeset.

== In progress Features ==
=== Using Obsolescence Marker during Rebase ===
There is two big issues with rebasing a set containing obsolescence changeset:

 * It is easy to create divergence
 * You get a lot of conflict when rebasing an obsolete stack on it's successors version.



To enable the current implementation set the config '''experimental.rebaseskipobsolete''' to true.

Current progress:

 * (./) rebase can skip obsolete changeset when rebased on successors,
 * (./) same logic as above, handling prune,
 * (./) same logic as above, handling split,
 * (./) ability to detect divergence creation and bail out,
 * {X} ability to rebase set with obsolescence inside the set (rebasing both precursors and successors) without creating divergence,
 * {X} official config to control these two behavior (either in one or two config)
 * {X} config on by default.

== Archived Topic ==
=== Obsstore Format ===
Markers are stored in an append-only file stored in '.hg/store/obsstore'.

==== V1 (current) Format ====
(see in line document for latest data)

===== quick summary =====
 * <number-of-successors(=N)><metadata-lenght(=M)><bits-field><precursor>(<successor>*N)<metadata>

 * B, I, B, 20s, (20s*N), s*M

===== longer explanation =====
The file starts with a version header:

 * 1 unsigned byte: version number, starting at zero.

The header is followed by the markers. Each marker is made of:

 * 1 unsigned byte: number of new changesets "N", can be zero.

 * 1 unsigned 32-bits integer: metadata size "M" in bytes.

 * 1 byte: a bit field. It is reserved for flags used in common
  . obsolete marker operations, to avoid repeated decoding of metadata entries.

 * 20 bytes: obsoleted changeset identifier.

 * N*20 bytes: new changesets identifiers.

 * M bytes: metadata as a sequence of nul-terminated strings. Each
  . string contains a key and a value, separated by a colon ':', without additional encoding. Keys cannot contain '\0' or ':' and values cannot contain '\0'.

==== V2 (current) Format ====
===== motivation =====
There is two extra information we would like to see in a second version of the format:

 * date: There is currently *always* a date in the meta data. So storing it explicitly would be more space efficient. It would also open the way to quickly access the date for sorting purpose (no use case yet but not crazy to think about it)

 * parents: When a changesets is pruned (obsoleted, no successors) we needs to records its parents. This is necessary to link the markers chain to the push/pull operation it is relevant to.

 * We may want to extend the bit field to 2 bytes. We currently use 1 and can see use case for 3-5 others (tracking the type of changes introduce by the rewriting (desc, patches, content, etc) so we are running short

 * We may also want to explicitly store the username of the marker's creator are they will always be ones. however there is no need for quick access of such information so it could stay in the metadata field.

===== possible change =====
'''Date''':

 * The date will be a 64bits float (for seconds since epoch) followed by a 16 bits integer (time zone)

 * It will make sense to put the date in front of the markers. that would give markers sorting some semantic.

'''Parents''':

We have multiple option for storing parents:

 1. Having an explicite field similar to successors (one byte to know how many parents, then parents)

 1. Having an explicite field but store the number of parent in the bit fields (since we never have more than 2 parents)

 1. Using the successors field. Having negative number of successors mean it is a prune.

Option (3) is the most space saving but prevent use to store parent information for more changesets if needed in the future (We do not have a final exchange plan yet).

Option (1) and (2) takes 2 to 8 bits more than (3) but are more flexible.

'''bit field'''

If we extend the bit field to 2 Bytes, it makes sense to use option (2) for storing parent.

===== Proposed Format =====
 * <date><timezone><number-of-successors(=N)><metadata-lenght(=M)><bits-field+nb-parents(=P)><precursor>(<successor>*N)(<parents>*P)<metadata>

 * d, h, B, I, H, 20s, (20s*N), (20s*P), s*M

The P number would be hidden in the bit field. We need to store 4 possible values here: 0 parents, 1 parent, 2 parents, ø parents information stored. Possible assignement is 00, 01, 10, 11. this let both 0 parent and ø parent info to be 0 module 3.

Changesets Evolution - development page.

/!\ This page is intended for developer

For a user perspective have a look at the ChangesetEvolution page.

1. Contributing

The simplest way to help is to grab one of: list of easy bug

There is also a multiple well defined topic that where idea exist but needs an implementation

  • Rebase could make more use of obsolescence marker:
    • detection that part of the rebase set is already in the destination
    • warning about divergence creation.
  • Obsolescence markers exchange (no really we have idea waiting to be implemented)

  • Prototype for a command bringing changeset back to life.

There is more complicated part that requires attention too.

  • Better storage//loading//cache for markers (depends on marker exchange)

  • Solving the problem of Topic Branch

  • Handling moving changesets around with the "plan" concept,
  • in memory merge (helps a lot of troubles resolution throught evolve)

  • Wide transaction capability
    • Continue/stop/abort for all command including evolve
    • In transaction content non exchangeable.
  • Making evolve capable of solving all troubles that user can encounter
  • Computing UI message about troubles from event who happened during the transaction.

2. Roadmap

Current status:

  • (./) Alpha Stage,

  • {X} Beta Stage,

  • {X} Release Stage,

2.1. Stage A

Changeset Evolution is currently at Alpha Stage. If won't eat people data, but only handful of people knowns how to get out some situation..

2.2. Stage B

All the "core" features should be here and somewhat work. Some will be sluggish and unpleasant to use as the focus is not yet on user interface.

We can gather the base-line expectations for this stage in 4 groups.

3. Troubles Resolution

Users need to know that he can trust the tool to offer him what it needs and bring him out of trouble. We have already cover a good share of that work with a descent hg evolve command handle most common case and a large set of small commands to each operation. Yet there is some area where the user is still left with no "simple" option.

  • (./) Evolve should have predictable result (--rev options and co),

  • Evolve should be able to somewhat handle all possibles troubles (including
    • divergence and bumping)
  • Evolve should have a working --abort and --continue (and --drop?)
  • We must have some command to "bring obsolete changeset" back. (issue4851)

4. Displaying Evolution History to the User

Now that we (almost) have all the brick to build a clean evolution history, there is a lot of way we could easily expose it to the user. This is critical to ensure the user have some awareness of the feature and is able to understand what is happening in case of troubles.

  • Having "hg evolve --list" displaying troubles changesets, trouble affecting
    • them and details of the diagnostic
  • Basic display of precursors/successors available in hg log

  • Some highlight of obsolete//unstable node in default log (color?)

5. Troubles Prevention

There is multiple cases were we know that an action from the user will create troubles (most notably divergence) preventing the user to do so in the first place (in the same philosophy) as phase would be very useful,

  • Having rebase skip changesets with successors in destination,
  • Having a generic mechanism to run "validation" before any "editing" of a
    • changesets,
  • requires a special flag or config to create divergence locally.

6. Low level Utilities

Currently, the "obsstore" is not really "fixable" having tool available to very advanced users would be good.

  • "hg strip" should have a way to strip related obsolescence markers,
  • "hg strip" should have a way to strip a whole obsolescence Lineage.
  • Some debug command to strip arbitraty markers?

6.1. Stage C

At this "beta" state, the UI and experience will not be easy/pleasant enough for normal user, but advance user of Mercurial should find their mark. We may consider shipping it with Core Mercurial with an experimental flag.

Blocker to beta release:

  • Obsmarkers exchange should be good enough to define a final storage format for markers.
  • On disk format should be stable in the foreseeable future and fit performance//exchange need.
  • Evolve should be able to solve (or provide a way to) all troubles that a user may encounter, especially divergence,
  • Evolve should have predictable result (--rev options and co),
  • Evolve should be abortable (wide transaction ?),
  • Performance impact should be reasonable,

6.2. Release Stage

A which point we can merge changesets evolution into core.

  • UI offering a Solution to the N² markers creation when editing history (TopicPlan),

  • Commands set defined enough to be freezed for backward compatibility
  • No race condition when exchanging with server (bundle2 + repo layout allowing atomic transaction),

  • No impactful Performance Regression (including efficient exchange),
  • Concrete plan to handle high volume of markers (archiving or something),

Other concept not directly involved in Changeset Evolution and closely related for technical or user experience reason.

7. In progress Features

7.1. Using Obsolescence Marker during Rebase

There is two big issues with rebasing a set containing obsolescence changeset:

  • It is easy to create divergence
  • You get a lot of conflict when rebasing an obsolete stack on it's successors version.

To enable the current implementation set the config experimental.rebaseskipobsolete to true.

Current progress:

  • (./) rebase can skip obsolete changeset when rebased on successors,

  • (./) same logic as above, handling prune,

  • (./) same logic as above, handling split,

  • (./) ability to detect divergence creation and bail out,

  • {X} ability to rebase set with obsolescence inside the set (rebasing both precursors and successors) without creating divergence,

  • {X} official config to control these two behavior (either in one or two config)

  • {X} config on by default.

8. Archived Topic

8.1. Obsstore Format

Markers are stored in an append-only file stored in '.hg/store/obsstore'.

8.1.1. V1 (current) Format

(see in line document for latest data)

8.1.1.1. quick summary
  • <number-of-successors(=N)><metadata-lenght(=M)><bits-field><precursor>(<successor>*N)<metadata>

  • B, I, B, 20s, (20s*N), s*M

8.1.1.2. longer explanation

The file starts with a version header:

  • 1 unsigned byte: version number, starting at zero.

The header is followed by the markers. Each marker is made of:

  • 1 unsigned byte: number of new changesets "N", can be zero.
  • 1 unsigned 32-bits integer: metadata size "M" in bytes.
  • 1 byte: a bit field. It is reserved for flags used in common
    • obsolete marker operations, to avoid repeated decoding of metadata entries.
  • 20 bytes: obsoleted changeset identifier.
  • N*20 bytes: new changesets identifiers.
  • M bytes: metadata as a sequence of nul-terminated strings. Each
    • string contains a key and a value, separated by a colon ':', without additional encoding. Keys cannot contain '\0' or ':' and values cannot contain '\0'.

8.1.2. V2 (current) Format

8.1.2.1. motivation

There is two extra information we would like to see in a second version of the format:

  • date: There is currently *always* a date in the meta data. So storing it explicitly would be more space efficient. It would also open the way to quickly access the date for sorting purpose (no use case yet but not crazy to think about it)
  • parents: When a changesets is pruned (obsoleted, no successors) we needs to records its parents. This is necessary to link the markers chain to the push/pull operation it is relevant to.
  • We may want to extend the bit field to 2 bytes. We currently use 1 and can see use case for 3-5 others (tracking the type of changes introduce by the rewriting (desc, patches, content, etc) so we are running short
  • We may also want to explicitly store the username of the marker's creator are they will always be ones. however there is no need for quick access of such information so it could stay in the metadata field.

8.1.2.2. possible change

Date:

  • The date will be a 64bits float (for seconds since epoch) followed by a 16 bits integer (time zone)
  • It will make sense to put the date in front of the markers. that would give markers sorting some semantic.

Parents:

We have multiple option for storing parents:

  1. Having an explicite field similar to successors (one byte to know how many parents, then parents)
  2. Having an explicite field but store the number of parent in the bit fields (since we never have more than 2 parents)
  3. Using the successors field. Having negative number of successors mean it is a prune.

Option (3) is the most space saving but prevent use to store parent information for more changesets if needed in the future (We do not have a final exchange plan yet).

Option (1) and (2) takes 2 to 8 bits more than (3) but are more flexible.

bit field

If we extend the bit field to 2 Bytes, it makes sense to use option (2) for storing parent.

8.1.2.3. Proposed Format
  • <date><timezone><number-of-successors(=N)><metadata-lenght(=M)><bits-field+nb-parents(=P)><precursor>(<successor>*N)(<parents>*P)<metadata>

  • d, h, B, I, H, 20s, (20s*N), (20s*P), s*M

The P number would be hidden in the bit field. We need to store 4 possible values here: 0 parents, 1 parent, 2 parents, ø parents information stored. Possible assignement is 00, 01, 10, 11. this let both 0 parent and ø parent info to be 0 module 3.

ChangesetEvolutionDevel (last edited 2020-05-29 08:03:48 by aayjaychan)