Differences between revisions 1 and 2
Revision 1 as of 2009-09-13 23:10:30
Size: 3242
Comment:
Revision 2 as of 2009-09-13 23:11:17
Size: 3265
Comment:
Deletions are marked like this. Additions are marked like this.
Line 2: Line 2:

'''Status: draft'''

EOL Translation Plan

Status: draft

Some people wish to have end-of-line characters translated into their native form for their operating system. This means CRLF (\r\n, carrige-return followed by line-feed) on Windows and LF (\n) on Unix. Older versions of Mac OS used CR (\r), but Mac OS X and later now use LF.

Problem

The existing solution is the Win32TextExtension, but it has a number of short-commings:

  • the settings are not part of the repository and so new developers must set them up the first time. This is the biggest problem.
  • the names of the filters provided by the extension are unfortunate (cleverencode: and cleverdecode:). Instead of configuring a given file to be a given type, one must "clever encode" in on commit and "clever decode" it on checkout.

  • the name of the extension itself indicates that this extension is only for Windows. It is true that this is probably the platform with the weakest tool support for files in non-native EOL format. However, users on other platforms might benifit from a EOL translation extension as well.

Solutions

The solutions generally agree that we need a version controlled file, (say, .hgeol in the repository root) which is used to declare which files the extension should convert. The file could look like this:

[patterns]
Windows.txt = CRLF
Unix.txt = LF
test/mixed.txt = BIN
**.txt = native
**.py = native
**.proj = CRLF

where files with a declared format are always checked out in that format, files declared as "native" are converted to the operating system native format, and files not mentioned (or declared as binary) receive no treatment. The first pattern wins, so more specific patterns should be put first.

Repository format

The following choices have been discussed:

Pure LF

In this solution, all files are converted to LF format upon commit. Both files that are specified as CRLF on checkout and files specified as native are converted.

Subversion works like this with its svn:eol-style property and the hope is that it will simplify the extension.

Mixed format

In this solution, files declared as LF or CRLF format are stored as-is in the repository. Files declared as native are stored in a configurable repository-native format.

The advantage of this solution is that it is minimally invasive. Even people who don't use the extension will get a checkout where LF and CRLF files have the right format. The disadvantage is that this gives the users more chances of configuring things in odd ways.

Unexpected format changes

What should we do if a file has an unexpected format when doing hg commit or hg diff? The easiest solution is to make an encode filter which translates both LF and CRLF into the target format, regardless of what the expected source format is.

Preferred Solution

...

Implementation

An implementation has been started here: http://bitbucket.org/digitalxero/hg-eol/. It uses a mixed format for the repository and converts all files (with no questions asked) into the target format.

It needs more testing (merges with differences in EOL format, user errors, behavior on Windows).

EOLTranslationPlan (last edited 2012-10-25 20:31:35 by mpm)