Note:
This page is primarily intended for developers of Mercurial.
Internal Changeset Plan
Status: Project
Main proponents: Pierre-YvesDavid
This is a speculative project and does not represent any firm decisions on future behavior.
Introduce a mechanism to track and hide changeset created as side effect of internal operation
1. Goal
Operation like amend, histedit, shelve created temporary internal commit. We are looking for some official way to record and hide them once they are no longer necessary.
2. Problem Space
2.1. Possible approach
Since introducing a new "internal" concept will requires a new repository requirement. We have multiple options at hands.
- dedicated 'internal' phases,
- ad-hoc mechanism (let us says bytemap),
- key in changeset extra,
(If someone think about something else, let me know)
2.2. Internal Changesets Properties
- They are irrelevant to the users, B. Nothing will be based on them, C. Once done with their purpose, we should not need them again, D. They never leave the repository, E. They never stop being internal changesets,
2.3. Life Cycle of Internal Changesets
2.3.1. amend
- transaction is open,
- a temporary commit is created,
<temp> is used to build the result of the amend,
- obsmarkers are created (between old and new),
<tmp> is "archived" (currently obsmarkers),
- transaction is committed.
The commit is created and "archived" in the same transaction. It is never visible to anyone.
Actually, we could fully remove this changeset with appropriate in-memory-ctx capabilities.
2.3.2. histedit
Scenario:
pick <initial-second> pick <initial-first> roll <initial-fourth> # the important part is the fold pick <initial-third>
(Assuming single transaction for simplicity)
1) transaction is open, 2) <initial-second> is rebased on base (possible conflict and associated transaction open/close) -> creates <final-first>, 3) <initial-first> is rebased on the <final-first> (possible conflict and associated transaction open/close) -> creates <temporary-second>, 4) <initial-fourth> is rebased on the <temporary-second> (no commit), (possible conflict and associated transaction open/close) 5) changes are folded in <temporary-second> as <final-second> 6) <initial-third> is rebased on the <final-second> (possible conflict and associated transaction open/close) -> creates <final-third>, 7) transaction is committed.
Possible rebase conflict during (4) will expose <temporary-second> to the user as:
- part of the merge conflict
- working directory parent.
The user needs to see that temporary commit during the merge conflict resolution.
2.3.3. shelve
shelving:
1) transaction is open, 2) shelved change are committed, 3) commit is recorded as a shelve-changeset, 4) apply hiding on this shelve, 5) transaction is committed
unshelving (I might be wrong, I've not followed everything):
1) transaction is open 2) uncommitted changes are put in a temporary commit *) access the shelve-changeset (somehow), 3) hg rebase -r <shelve> -d <tmp> <) possible conflict resolution that requires transaction commit >) possible transaction reopen (if conflict) =) rebase complete 4) grab the result of the shelve and restore working copy parent 5) hide <tmp> and <rebased> 6) close transaction
Note: during possible rebase conflict, the "shelve" is currently visible to the user. Can we change this ?
* At minimum, it needs to be at least visible to the "hg resolve command". This is easy to achieve in all cases.
* Having it visible seems valuable for the user experience.
2.3.4. conclusion regarding life cycle
Even if many internal never needs to be visible out of a transaction that creates them. There seems to be valid cases were the internal changeset is exposed to the user to help with merge conflict (histedit and shelve).
It is worth noting that while the changeset need to be visible, they seems to always be working copy parent (directly or through a merge). So the current mechanism to unhide could simply work to deal with this life cycle.
In general however, once an internal changeset has been hidden, we won't need it again. There is a small exception with <shelved> changeset. More on that below.
2.4. Additional note about shelve
The shelve extensions use a full call to the "rebase" command to merge the shelved change on the destination. It could directly use the core merge mechanism to perform this graft (not using the rebase extension). This would allow to skip the temporary commit currently created by rebase.
In addition, if I'm not mistaken, core-merge is able to merge with a dirty working copy. So it might be possible to directly trigger that "graft" without the temporary local commit. hg resolve will still use and display proper information. And the pre-merge content will be available for restoration on --abort.
That would lift the need of temporary internal changeset during unshelve.From there, the only internal changeset involved in shelve would be the shelved changeset itself.
In addition Since we keep track of shelved changeset, it will be easy to feed them to the hiding logic as long as they are official shelve. However, when the shelve is deleted, we want to be able to hide <shelved> changeset and the internal-changeset concept is handy.
The idea of using a mechanism dedicated to shelve for hiding active shelve allow to lift the exception created by shelves in regard with the life-cycle of internal-changeset. This can also become useful if people starts exchanging shelve between repository (urg) as won't make another exception to the internal space.
3. Analysis of available option
Lets dive deeper in the various option we have:
3.1. Phases
We could use new dedicated phase(s) to keep track of internal changesets.
3.1.1. advantages
- dedicated phases express the distinction from real changeset well,
- implementation is already here and fast, including all the UI bits,
- phases already have the concept of monotonous life cycle. It ishelpful regarding internal-life-cycle (and easy to use two phases for it)
3.1.2. disadvantages
- Updating the phase concept means new complexity:
- we probably want to enforce that nobody gets out of internal, there is no such check on boundaries yet,
- We either give up on using phase on anything else, or we add some complexity to the concept (no longer one-dimensional order)
3.1.3. other
- exchange is not part of the equation, we have a large freedom regarding such phases,
- since internal changesets are not meant to have unexpected descendant, the 'rooting' of phase will not be an issue.
3.1.4. summary
Phases seems like a good option, most of the usual drawback of phases regarding 'hiding' are neutralized by "internal-changeset" property and it fits well in the concept to separate 'internal' from the other changesets. The main reservation would be around the change implied to the phases concept.
3.2. ad-hoc solution
We could build a dedicated solution to track internal changeset and their life-cycle (eg: bit maps, root tracking)
3.2.1. advantages
- monotonous (visible → invisible) life cycle makes bit maps simpler,
- ad-hoc concept means no pre-existing constraints,
- ability to track life cycle if wished,
3.2.2. disadvantages
- as feature is new, everything needs to be build from scratch, storage, handling, safeguard, display to user, etc.
- introduce new UI concept
3.2.3. other
- This approach allow to preserve phases on the internal changesets, but since they are not intended for real-space of changesets we do not really have usage for phases with them.
3.2.4. summary
This seems a possible way to implement internal changeset. It might implies quite some work.
3.3. changeset data
We could use a special key in extra (eg: '_internal') to track internal changesets.
3.3.1. advantages
- consistency: makes it very clear 'internal' is a permanent property of the changeset (it has been created for internal use),
- prevent-collision: 'extra' are part of the hash, making collision between the real-space and internal-space impossible.
3.3.2. disadvantages
- need-caching: reading extra of all non-public changeset will be too slow. So we'll need a cache, building such cache is likely as complicated as building an adhoc solution.
- do not handle life cycle: since commits are immutable, we cannot track the life cycle of internal changeset with it. There is no way to convey the difference between the internal commit we still needs to see and the other. So we will have to rely on an extra mechanism here.
3.3.3. other
- using extra usually helps with transferring information from a repository to another. This is not relevant here since internal changeset are not meant to be exchanged.
- This approach allow to preserve phases on the internal changesets, but since they are not intended for real-space of changesets we do not really have usage for phases with them.
3.3.4. summary
Despite a couple of interesting property, using extra for 'internal' will not be very adequate for the task at hand. I would requires extra performance work and a secondary concept to handle the life cycle.
=== Extra thought about life cycle ===
Their is a couple of way to handle the internal-life-cycle while tracking a single 'internal' "state".
- late move to internal: we can create "normal" changeset and mark them internal when we are done with them.
- external local-hiding mechanism: If we get a generic hiding mechanism, we could just track that a changeset is internal, but rely on the generic local-hiding mechanism for the second part of the life cycle.
4. Conclusion
4.1. short version
At the current stage of my reflexion, my personal choice will be:
introduce one new phase: internal
- also add a '_internal' extra key for good measure since it adds goods property.
- introduce a context manager to create/interact with temporary changesets
4.2. rational
I'm going for phase-space because 'public/draft/secret' do not make sense for internal changeset anyway. Making it some explicit with an 'internal' phase seem a good move.
In addition, we already have all the UI and concept around phase. So introducing a new one will not add complexity to our UI.
We go for a single phases since a all lyfe-cycle/visibility concerns can apparently be taken care of the fact working copy parent are visible.
We could rely on a generic local hiding mechanism but having a dedicated phases increase insulation. Such insulation reduce the chance of a user touching internal visibility by mistake and help filtering them out of the UI. Implementation is not more complex since we can already feed the visibility code from multiple sources.
I keep the 'extra' key idea to make sure we'll never collide with 'real-space'.
4.2.1. implementation idea
- phase 'internal' is hidden and not exchanged. checkout and merging with it requires special flag/mode unavailable to the user,
- similar flag/mode (probably context manager) is to be used when creating internal changesets,
- we use 'internal=32'. (to leave use some room for other phases)
we add checks so that changeset with a phase <= 32 never goes below.
- we use the '_internal extra' to guarantee the lack of collision with real space, (and possible safeguard).
5. Roadmap
<discussion stage>