Differences between revisions 9 and 10
Revision 9 as of 2009-11-18 22:39:37
Size: 5324
Comment:
Revision 10 as of 2009-11-26 20:46:13
Size: 5695
Comment: Updated to match my current understand of this.
Deletions are marked like this. Additions are marked like this.
Line 13: Line 13:
 * the names of the filters provided by the extension are unfortunate (`cleverencode:` and `cleverdecode:`). Instead of configuring a given file to be a given type, one must "clever encode" in on commit and "clever decode" it on checkout.  * the names of the filters provided by the extension are not intuitive (`cleverencode:` and `cleverdecode:`). Instead of configuring a given file to be of a given EOL type, one must "clever encode" in on commit and "clever decode" it on checkout.
Line 17: Line 17:
== Solutions == == Solution ==
Line 19: Line 19:
The solutions generally agree that we need a version controlled file, (say, `.hgeol` in the repository root) which is used to declare which files the extension should convert. The file could look like this: We will keep configuration in a version controlled file called `.hgeol` in the repository root. It declares which files the extension should convert. The file looks like this:
Line 31: Line 31:
where files with a declared format are always checked out in that format, files declared as "native" are converted to the operating system native format, and files not mentioned (or declared as binary) receive no treatment. The first pattern wins, so more specific
patterns should be put first.
Files with a declared format are always checked out in that format, files declared as "native" are converted to the operating system native format, and files not mentioned (or declared as binary) receive no treatment. The first pattern wins, so more specific patterns should be put first. In the above example, the `test/mixed.txt` file is not converted to native encoding since the `test/mixed.txt` rule matches before the `**.txt` rule.
Line 36: Line 35:
The following choices have been discussed: Files declared as LF or CRLF format are stored as-is in the repository. Files declared as native are stored in a configurable repository-native format which defaults to LF.
Line 38: Line 37:
==== Pure LF ====

In this solution, all files are converted to LF format upon commit. Both files that are specified as CRLF on checkout and files specified as native are converted.

Subversion works like this with its `svn:eol-style` property and the hope is that it will simplify the extension.

==== Mixed format ====

In this solution, files declared as LF or CRLF format are stored as-is in the repository. Files declared as native are stored in a configurable repository-native format.

The advantage of this solution is that it is minimally invasive. Even people who don't use the extension will get a checkout where LF and CRLF files have the right format. The disadvantage is that this gives the users more chances of configuring things in odd ways.
This solution is minimally invasive. Even people who don't use the extension will get a checkout where LF and CRLF files have the right format.
Line 52: Line 41:
When .hgeol is changed, the EOLs in working directory can get out of sync with the .hgeol file. This should make the next commit abort with a message: When `.hgeol` is changed, the EOLs in working directory can get out of sync with the `.hgeol` file. This should make the next commit abort with a message:
Line 54: Line 43:
{{{
Line 56: Line 46:
}}}
Line 57: Line 48:
The hg eolupdate command will rewrite files in the working directory to match the `.hgeol` file. After that, the commit will succeed as normal and include the rewritten files. The hg eolupdate command will rewrite files in the working directory to match the `.hgeol` file. After that, the commit will succeed as normal and include the rewritten files. This means that updates to `.hgeol` are made in lock-step with the corresponding file changes, hopefully keeping things nicely synchronized in the repository.
Line 59: Line 50:
==== Separate content and EOL format ====

The creation of this new command could be an opportunity to clearly separate content changes, and EOL style changes. We can try to guide our users into not mixing those changes together in a single commit, by having a safe `hg eolupdate`: in this aspect, the command would fail if the repository has uncommitted changes, to specifically avoid updating EOLs in a "content" changeset.

Symmetrically, it could make sense to have a `hg commit` aware of the purpose of `.hgeol` and other EOL-related settings: the commit would be aborted if both `.hgeol` and other files have uncommitted changes.
The `eolupdate` command will make it easy to clearly separate content changes and EOL style changes. We can try to guide our users into not mixing those changes together in a single commit, by letting `hg eolupdate` fail if the repository has uncommitted changes, to specifically avoid updating EOLs in a "content" changeset.
Line 68: Line 55:

==== How Subversion does it ====

We have a [[http://bitbucket.org/mg/hg-eol/src/tip/tests/test-svn|test case]] that documents how Subversion handles some cases of inconsistencies. It turns out that

 * Subversion will '''silently recode''' a file to the specified EOL style if the file has a consistent style. So accidentally changing all EOLs from LF to CRLF wont matter for Subversion -- it will rewrite the file in the working copy upon commit.

 * Subversion will also '''silently update''' files if the `svn:eol-style` property is set of them. Until the commit, `svn diff` is empty, but the file is changed in the working copy on commit.

 * Subversion '''aborts''' on commit if a file has inconsistent EOL style.
Line 77: Line 74:
== Preferred Solution ==

...
Line 85: Line 78:
It needs more testing (merges with differences in EOL format, user errors, behavior on Windows). === TODO ===

 * The extension needs testing on Windows.

 * Needs testing with merges of files with different EOL styles.

 * There is currently no `hg eolupdate` command.

 * The `.hgeol` file is read from the repository tip revision instead of the working copy parent revision.

EOL Translation Plan

Status: draft

Some people wish to have end-of-line characters translated into their native form for their operating system. This means CRLF (\r\n, carrige-return followed by line-feed) on Windows and LF (\n) on Unix. Older versions of Mac OS used CR (\r), but Mac OS X and later now use LF.

Problem

The existing solution is the Win32TextExtension, but it has a number of short-comings:

  • the settings are not part of the repository and so new developers must set them up the first time. This is the biggest problem.
  • the names of the filters provided by the extension are not intuitive (cleverencode: and cleverdecode:). Instead of configuring a given file to be of a given EOL type, one must "clever encode" in on commit and "clever decode" it on checkout.

  • the name of the extension itself indicates that this extension is only for Windows. It is true that this is probably the platform with the weakest tool support for files in non-native EOL format. However, users on other platforms might benefit from a EOL translation extension as well.

Solution

We will keep configuration in a version controlled file called .hgeol in the repository root. It declares which files the extension should convert. The file looks like this:

[patterns]
Windows.txt = CRLF
Unix.txt = LF
test/mixed.txt = BIN
**.txt = native
**.py = native
**.proj = CRLF

Files with a declared format are always checked out in that format, files declared as "native" are converted to the operating system native format, and files not mentioned (or declared as binary) receive no treatment. The first pattern wins, so more specific patterns should be put first. In the above example, the test/mixed.txt file is not converted to native encoding since the test/mixed.txt rule matches before the **.txt rule.

Repository format

Files declared as LF or CRLF format are stored as-is in the repository. Files declared as native are stored in a configurable repository-native format which defaults to LF.

This solution is minimally invasive. Even people who don't use the extension will get a checkout where LF and CRLF files have the right format.

Update repository on setting changes

When .hgeol is changed, the EOLs in working directory can get out of sync with the .hgeol file. This should make the next commit abort with a message:

abort: EOL mis-match in Windows.txt: has LF, but should have CRLF
(run "hg eolupdate" to update files)

The hg eolupdate command will rewrite files in the working directory to match the .hgeol file. After that, the commit will succeed as normal and include the rewritten files. This means that updates to .hgeol are made in lock-step with the corresponding file changes, hopefully keeping things nicely synchronized in the repository.

The eolupdate command will make it easy to clearly separate content changes and EOL style changes. We can try to guide our users into not mixing those changes together in a single commit, by letting hg eolupdate fail if the repository has uncommitted changes, to specifically avoid updating EOLs in a "content" changeset.

Unexpected format changes

What should we do if a file has an unexpected format when doing hg commit or hg diff? The easiest solution is to make an encode filter which translates both LF and CRLF into the target format, regardless of what the expected source format is.

How Subversion does it

We have a test case that documents how Subversion handles some cases of inconsistencies. It turns out that

  • Subversion will silently recode a file to the specified EOL style if the file has a consistent style. So accidentally changing all EOLs from LF to CRLF wont matter for Subversion -- it will rewrite the file in the working copy upon commit.

  • Subversion will also silently update files if the svn:eol-style property is set of them. Until the commit, svn diff is empty, but the file is changed in the working copy on commit.

  • Subversion aborts on commit if a file has inconsistent EOL style.

Semantics

Regardless of the implementation details, we are aware that we will need to pick unambiguous names for our various components. For some, native does not stand out as a name that is self-explanatory, but it does make sense to those exposed to Subversion's svn:eol property setting which inspired this mechanism in the first place.

A naming policy centered on storage might be more clear to end-users: storeasis, storeaslf is already depicting the behavior on commit, for example. Depending on the implementation, it might be interesting to specify distinctly the behaviors on commit, and on update: "storeasis, getaslf" or "storeaslf, getasis", or "storeascrlf, converttolocal" are too long, but are self-explanatory.

Some suggested that mercurial should not use crlf or lf in our names, and use instead 'windows' and 'unix', respectively. One convention can be chosen, or aliases can be used.

Implementation

An implementation has been started here: http://bitbucket.org/mg/hg-eol/. It uses a mixed format for the repository and converts all files (with no questions asked) into the target format.

TODO

  • The extension needs testing on Windows.
  • Needs testing with merges of files with different EOL styles.
  • There is currently no hg eolupdate command.

  • The .hgeol file is read from the repository tip revision instead of the working copy parent revision.

EOLTranslationPlan (last edited 2012-10-25 20:31:35 by mpm)