#pragma section-numbers 2 <> = Path Conflicts = '''Status: Early Proposal''' '''Main proponents: MarkThomas''' /!\ This is a speculative project and does not represent any firm decisions on future behavior. == The Problem == Path conflicts occur during merges when one side of the merge has a file or link that has the same path name as a directory containing one or more files or links on the other side. These can occur in either direction, the local repository might have a file that is a directory on the remote, or vice versa. In general, we say that the path conflict occurs at the shortest path, i.e. a file on one side conflicts with a directory containing one or more files on the other side. Currently Mercurial deals only with files, and does not handle path conflicts at all. This can lead to serious bugs, particularly when the conflict occurs at a symlink to a directory. The problems occur in two places in particular: updating and merging. An example bug that is caused by this: [[https://bz.mercurial-scm.org/show_bug.cgi?id=5628|issue5628]] == Current Behaviour == In this document, “local” means the target revision of a merge, or the working copy in the case of update, and “remote” means the source revision. These are the terms used in the Mercurial merge module. === Updating === When a local file or symlink conflicts with a remote file or symlink: * If the local file is unknown or ignored, we follow the `merge.checkunknown` and `merge.checkignored` config options and either abort, warn, or proceed with updating to the new file. * If the local file is tracked but not committed, the update aborts with a merge conflict. When a local file conflicts with a remote directory: * In all cases we abort with “Not a directory”, and the working copy is left in an unfinished update state. '''This is a bug.''' When a local directory conflicts with a remote file or symlink: * If the local directory is empty, it is deleted and replaced with the remote file or symlink. * If the local directory is not empty, we abort with “Non-empty directory” and the working copy is left in an unfinished update state. '''This is a bug.''' For these path conflicts Mercurial should have the same behaviour as file conflicts. === Merging === This applies for all kinds of merging, including rebasing and grafting. When a local file or symlink conflicts with a remote file or symlink: * The user is prompted to merge the files. In the case of a conflict involving a symlink, the user is prompted to select whether they want to use the local or remote file or symlink. * If the merge fails, the working copy is left in the merge conflict state. Fixing the merge conflict allows the merge to be completed. When a local file conflicts with a remote directory: * The merge aborts with “Not a directory”, and the working copy is left in an unfinished update state that must be aborted. The merge cannot be completed. '''This is a bug.''' When a local symlink conflicts with a remote directory: * The merge aborts with “path traverses symbolic link”, and the working copy is left in an unfinished update state that must be aborted. The merge cannot be completed. '''This is a bug.''' When a local directory conflicts with a remote file or symlink: * The merge aborts with “Non-empty directory”, and the working copy is left in an unfinished update state that must be aborted. The merge cannot be completed. '''This is a bug.''' For these path conflicts, Mercurial should instead prompt the user to merge the file and directory, or leave the the working copy in a merge conflict state that can be resolved. The goal of this plan is to fix all the bugs listed above. == The Solution == We introduce path conflicts as a new kind of conflict that can be detected during update and must be handled during merge. These are manifest-level conflicts, and so are detected during manifest merge. During update, we also take care to check for conflicting directories as well as conflicting paths, and handle them accordingly. === Example === Consider the following scenario. Files in the local revision: {{{ a/b }}} Files in the remote revision: {{{ a/b/c1 a/b/c2 a/b/d/e }}} When merging from the remote revision to the local revision we expect to be informed of a conflict at `a/b`. The working copy is left in the following state: {{{ a/b~localhash a/b/c1 a/b/c2 a/b/d/e }}} The file is renamed to include the hash of the local revision. If the local revision is a modified tracked file in the working copy, e.g. when updating to a new commit with changed but uncommitted files, then the local hash is the hash of the commit with a `+` appended (as is the same for other descriptions of a modified working copy). If there is already a tracked file of that name, then it is additionally suffixed with `~N`, where `N` is the smallest number for which a tracked file does not already exist. It is always the file that is renamed. Since Mercurial tracks files rather than directories, renaming the directory would involve propagating the rename to all the files inside that directory, which would create lots of noise and make it harder to keep track of the copies, particularly if the user renames the directory back to the original name. The path conflict must be resolved by deleting or explicitly adding the renamed file, or by renaming it or the directory to a more suitable name. Once resolved, the user must run `hg resolve --mark` on the original conflicting path. The situation is similar for the case where a remote file conflicts with a local directory, except that the file is renamed to include the remote revision hash. This behaviour is similar to git. Git will use branch names in preference to commit hashes, however for simplicity we will only use hashes. == Implementation Details == === Merging Manifests === In `merge.manifestmerge` we need to add a new merge action representing creation of a path conflict. When a file exists only on one side, we check if it has any path conflicts (i.e. a directory with the same name, or a file that matches any of its path prefixes), and if so, create a new merge action of the form: {{{ ('p', (renamedfilename, origin), "path conflict") }}} The conflict is always listed against the shortest path, i.e. the path that is a file on one side and a directory on the other. The renamedfilename parameter is the safe name that was created by appending a commit hash, and the origin parameter is 'l' for a local path conflict (i.e., one where the file was on the local side) and 'r' for a remote path conflict. We also create a merge action for performing the rename to the safe name. For conflicts where the remote file is the one that conflicts, we can re-use the `'dg'` action to perform a renamed get action. For conflicts where it is the local file that conflicts, we introduce a new merge action representing a rename for path conflict resolution. This takes the form: {{{ ('pr', (oldfilename,), "local path conflict") }}} We always leave the repository in the conflicted state, and it is up to the user to resolve the conflict by deleting or renaming files, and then marking the path conflict as resolved by running `'hg resolve --mark'` on the conflicting path. === Merge Conflict State === The `merge.mergestate` class is extended with a new record type: {{{ P: a path conflict to be resolved }}} Merge state records use capital letters to signify that versions of Mercurial that do not understand the merge record must abort and refuse the process the merge state, and lower case letters to signify advisory merge state records that are safe to ignore. This record uses a capital letter so that old versions of Mercurial will refuse to process it. The record has a sub-type of either 'pu' or 'pr' depending on whether or not it has been marked as resolved. These are analogous to the 'u' and 'r' sub-types of normal file conflicts. The record also contains the path that conflicts, and the path that the renamed file was renamed to, as well as whether it was a local or remote file that was renamed. The path conflict is deemed resolved when the user runs `hg resolve --mark` on the original conflicting path. === Updating the Working Copy === In `merge.applyupdates`, path conflicts for local files are dealt with by renaming the file, adding the original commit hash as a suffix. This is the new `'pr'` action. In `merge._checkunknownfile`, we additionally check the following: * If any of the path prefixes of the target file exists as a file or link, we consider this a conflict and abort or warn as appropriate. If we are not aborting then the file is deleted. * If the file already exists as a directory, we consider this a conflict and abort or warn as appropriate. If we are not aborting, the directory and all of its contents are deleted. == Concerns == === Checking dirs in manifests === In order to determine whether a file conflicts with directories in the other manifest, we must query the other manifest to see if it contains a particular directory. We can do that with `othermanifest.hasdir(dirname)`, however for flat manifests this works by building a `util.dirs` object, which may be expensive to build for repos with large manifests. It should be possible to improve the implementation of `manifest.hasdir()` to be more efficient for flat manifests by binary searching for any file that begins with `dirname + sep`. Tree manifest already has an efficient implementation that simply looks for the tree manifest directory node. === Check files within dirs in manifests === Similarly, when applying merge actions to update from a revision where a path is a directory to one where that path is a file, we must make sure that the set of actions has deleted all of the files in the directory. This involves checking all files in a directory, which currently requires iterating over the whole manifest. For large repos, this is slow. ---- CategoryDeveloper CategoryNewFeatures