Differences between revisions 11 and 12
Revision 11 as of 2015-10-20 11:08:00
Size: 5710
Editor: Rain
Comment:
Revision 12 as of 2015-10-20 11:08:47
Size: 5777
Editor: Rain
Comment:
Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
It is somewhat common in the real world to have ''generated files'' alongside source files in the working copy. When a merge happens, generated files that are modified on both ends are likely to cause merge conflicts. The best way to resolve these conflicts is usually to regenerate these files, and that's what developers have to typically do by hand. This gets really tedious, though. It is somewhat common in the real world to have ''generated files'' alongside source files in the working copy. When a merge happens, generated files that are modified on both ends are likely to cause merge conflicts. The best way to resolve these conflicts is usually to regenerate these files, and that's what developers have to typically do by hand. This is something that can be completely automated in principle, and doing this by hand '''sucks'''.

Merge Driver Plan

1. The Problem

It is somewhat common in the real world to have generated files alongside source files in the working copy. When a merge happens, generated files that are modified on both ends are likely to cause merge conflicts. The best way to resolve these conflicts is usually to regenerate these files, and that's what developers have to typically do by hand. This is something that can be completely automated in principle, and doing this by hand sucks.

Mercurial should be able to automatically resolve generated files.

1.1. But isn't checking in generated files bad?

While there are a lot of ways checking in generated files is bad, there are also valid use cases for it. For example:

  • These files change relatively rarely for individual developers but often enough in the aggregate to be a problem.
  • These files take a long time to generate but the resultant artifacts are small.
  • These files capture the state of the world they were created in (e.g. databases) in important ways. That state of the world can change such that the files can no longer be generated again.
  • Serving these files via an out-of-band mechanism like an artifact server is not feasible, or much more work than just serving them via Mercurial.
  • While the files could be generated by a build system, the project really has no need for a build system outside of these generated files, and would like to keep fast iteration cycles by avoiding build steps.

Each of the above points has been true for at least one repository at at least one large organization.

Ultimately, software engineering is often about tradeoffs, and in some cases checking in generated files is the right tradeoff to make. This feature will make working with such files less painful.

1.2. Doesn't the merge tool support already in Mercurial solve this problem?

Mercurial does support custom merge tools for arbitrary globs of files, but the current merge tool support lacks some important features:

  • They only work when each generated file has a separate command you need to run: however, in some cases multiple files can be regenerated with a single command.
  • It is only suitable when the set of generated files is statically known: in some cases this configuration will be part of the repository itself, in e.g. a JSON file.
  • Most importantly, there's no way to define an ordering for file resolutions. Generated files form a dependency graph -- they might depend on source files, other generated files, and so on. Resolutions need to be performed in topological order (source files first, then the generated files that depend on source code alone, then further generated files, and so on).

There's no way we can reasonably bake all of the above into configuration -- it is incredibly specific to the codebase.

2. The Solution

Add first-class support to Mercurial for generated files and generation steps.

  1. Add support to Mercurial for custom merge drivers. A merge driver is a piece of code that controls the overall merge process.

  2. Add support to Mercurial for driver-resolved files. A driver-resolved file is a file that will be handled by the merge driver outside of the usual resolve mechanism.

  3. Have the merge driver expose two top-level operations: preprocess and conclude.

    • preprocess runs right before files are resolved. Typically, this is where files will be marked as driver-resolved.

    • conclude runs right after all source files have been resolved by the user. Typically, this is where driver-resolved files will be regenerated.

3. The Implementation

3.1. The merge driver

A merge driver is a python (in-process) hook that has the ability to control the overall merge process. It implements preprocess and conclude as top-level functions:

   1 def preprocess(ui, repo, hooktype, mergestate, wctx, labels):
   2     ...
   3 
   4 def conclude(ui, repo, hooktype, mergestate, wctx, labels):
   5     ...

The hook is configured with

[ui]
mergedriver = python:path/to/hook

3.1.1. Why is the hook in-process?

Unlike with other kinds of hooks, the wlock must be held while this hook is called. This raises a bunch of issues with subprocess-based hooks, especially since many Mercurial operations the subprocess might want to do will require that the wlock be taken. Mercurial currently has no notion of locks being inherited by subprocesses. Trying to add one raises a lot of concerns, including:

  • How does the parent process invalidate internal data structures after child processes are complete?
  • How is mutual exclusion enforced between child processes that could be started up in parallel?
  • How are child processes prevented from outlasting the hook?
  • If the parent process crashes or is killed, how do child processes get to know?
  • How would this interact with the command server?

These problems are all solvable, but at least for the initial implementation it is simpler to avoid all these problems by staying in-process.

3.2. Driver-resolved files

Driver-resolved files are marked with a separate state in the merge driver.

  1. Make merge.update (which implies merge, update, graft, rebase etc) and resolve driver-aware. This includes:

    • running conclude at the end of merge or resolve, but only if there are no unresolved files, and (in the case of resolve) a resolve has been requested for a driver-resolved file.

    • not allowing driver-resolved files to be marked if someone runs hg resolve --mark --all.

    • marking error states (crashes etc) correctly.

<more details to come>

MergeDriverPlan (last edited 2015-10-20 21:35:23 by KevinBullock)