Merge Driver Plan

1. The Problem

It is somewhat common in the real world to have generated files alongside source files in the working copy. When a merge happens, generated files that are modified on both ends are likely to cause merge conflicts. The best way to resolve these conflicts is usually to regenerate these files, and that's what developers have to typically do by hand. This is something that can be completely automated in principle, and doing this by hand sucks.

Mercurial should be able to automatically resolve generated files.

1.1. But isn't checking in generated files bad?

While there are a lot of ways checking in generated files is bad, but it can make sense if:

Each of the above points has been true for at least one repository at at least one large organization.

Ultimately, software engineering is often about tradeoffs, and in some cases checking in generated files is the right tradeoff to make. This feature will make working with such files less painful.

1.2. Doesn't the merge tool support already in Mercurial solve this problem?

Mercurial does support custom merge tools for arbitrary globs of files, but the current merge tool support lacks some important features:

There's no way we can reasonably bake all of the above into configuration -- it is incredibly specific to the codebase.

2. The Solution

Add first-class support to Mercurial for generated files and generation steps.

  1. Add support to Mercurial for custom merge drivers. A merge driver is a piece of code that controls the overall merge process.

  2. Add support to Mercurial for driver-resolved files. A driver-resolved file is a file that will be handled by the merge driver outside of the usual resolve mechanism.

  3. Have the merge driver expose two top-level operations: preprocess and conclude.

    • preprocess runs right before files are resolved. Typically, this is where files will be marked as driver-resolved.

    • conclude runs right after all source files have been resolved by the user. Typically, this is where driver-resolved files will be regenerated.

3. The Implementation

3.1. The merge driver

A merge driver is a python (in-process) hook that has the ability to control the overall merge process. It implements preprocess and conclude as top-level functions:

   1 def preprocess(ui, repo, hooktype, mergestate, wctx, labels):
   2     ...
   3 
   4 def conclude(ui, repo, hooktype, mergestate, wctx, labels):
   5     ...

The hook is configured with

[ui]
mergedriver = python:path/to/hook

The merge driver has its own states:

The state transitions look like (this is currently broken):

For a paused merge, the merge driver and its state are stored in the merge state. The merge state gets a new entry, with a lowercase m. The lowercase indicates that this record is advisory and that older versions of Mercurial can ignore it.

This bit under consideration. Whenever the merge driver is accessed from disk, its current value in the configuration is compared with the old value from disk. If the value is different we abort, with the only way out being to abort the merge and redo it from the beginning.

3.1.1. Why is the hook in-process?

Unlike with other kinds of hooks, the wlock must be held while this hook is called. This raises a bunch of issues with subprocess-based hooks, especially since many Mercurial operations the subprocess might want to do will require that the wlock be taken. Mercurial currently has no notion of locks being inherited by subprocesses. Trying to add one raises a lot of concerns, including:

These problems are all solvable, but at least for the first iteration it is simpler to avoid all these problems by staying in-process.

3.1.2. Why are these top-level functions?

The API is specifically designed to discourage sharing state between the preprocess and conclude functions. Any such storage of state is almost certainly a bug, because conclude can be called without calling preprocess.

3.1.3. But I really need to share state.

Your options are:

All the options have tradeoffs -- the API is designed to make implementers think about this problem rather than just exposing an object and having them getting it wrong.

3.1.4. Why can't multiple merge drivers be defined?

Unlike with other kinds of hooks, there is in general no reasonable way to compose merge drivers. In particular -- what if different merge drivers disagree on how to generate a particular file? The semantics of multiple merge drivers get confusing very quickly.

If a repository really has independent merge drivers, it should be straightforward to write a wrapper merge driver that composes them.

3.1.5. Why store the merge driver in the merge state?

Storing the fact that files are driver-resolved without storing how to resolve them is not really helpful. (Counterpoint: we don't store merge tool configuration in the merge state, so maybe we shouldn't bother with this either.)

3.1.6. Why is the merge driver record advisory?

We will store the merge driver in the merge state whenever it is configured. In a lot of cases running the merge driver will not be necessary -- in those cases there's no point in aborting. It only makes sense to abort when the merge driver is "active" -- when there are files that need to be resolved by the driver. That case will be handled below.

3.1.7. Why do we need to be so paranoid about the merge driver's value?

Mostly for security reasons. Consider the following case:

  1. A configures a malicious merge driver in their ~/.hgrc, then pauses the merge.
  2. A gives a copy of their entire repo, including .hg (but not ~/.hgrc), to B.
  3. B inspects .hg/hgrc and finds it to be clean.
  4. B then continues the merge, and the malicious merge driver gets invoked.

Aborting when the merge driver has changed is one way to deal with this. Exactly how this should be handled is still under discussion.

3.1.8. What can go wrong?

The merge driver could:

3.2. Driver-resolved files

Driver-resolved files are marked with a brand new state -- not u or r, but d (shown as D in hg resolve --list).

Since old versions of Mercurial will not be able to understand what these files mean, they're stored as a separate type of record: D (rather than the standard F). The contents of D records are the same as those of F records. D is uppercase so that old versions of Mercurial abort when they see such files.

3.3. Changes to operations

3.3.1. merge.applyupdates

This function is what actually changes the working copy whenever a merge (or update, or graft, or rebase...) happens.

3.3.2. resolve

3.3.3. commit

MergeDriverPlan (last edited 2015-10-20 21:35:23 by KevinBullock)