Working Copy Sync Plan
Contents
1. The Problem
It is somewhat common in the real world to have generated files alongside source files in the working copy. When a merge happens, generated files that are modified on both ends are likely to cause rebase conflicts. The best way to resolve these conflicts is usually to regenerate these files, and that's what developers have to typically do by hand.
1.1. But isn't checking in generated files bad?
While there are a lot of ways checking in generated files is bad, there are also valid use cases for it. For example:
- These files change relatively rarely for individual developers but often enough in the aggregate to be a problem.
- These files take a long time to generate but the resultant artifacts are small.
- These files capture the state of the world they were created in (e.g. databases) in important ways. That state of the world can change such that the files can no longer be generated again.
- Serving these files via an out-of-band mechanism like an artifact server is not feasible, or much more work than just serving them via Mercurial.
- While the files could be generated by a build system, the project really has no need for a build system outside of these generated files, and would like to keep fast iteration cycles by avoiding build steps.
Each of the above points has been true for at least one repository at at least one large organization.
Ultimately, software engineering is often about tradeoffs, and in some cases checking in generated files is the right tradeoff to make. This feature will make working with such files less painful.
1.2. Doesn't the merge tool support already in Mercurial solve this problem?
Mercurial does support custom merge tools for arbitrary globs of files, but the current merge tool support lacks some important features:
- They only work when each generated file has a separate command you need to run: however, in some cases multiple files can be regenerated with a single command.
- It is only suitable when the set of generated files is statically known: in some cases this configuration will be part of the repository itself, in e.g. a JSON file.
- Most importantly, there's no way to define an ordering for file resolutions. Generated files form a dependency graph -- they might depend on source files, other generated files, and so on. Resolutions need to be performed in topological order (source files first, then the generated files that depend on source code alone, then further generated files, and so on).
There's no way we can reasonably bake all of the above into configuration -- it is incredibly specific to the codebase.
2. The Solution
Add support to Mercurial for custom merge drivers. A merge driver is an in-process hook that controls the overall merge process.