Size: 5272
Comment: does not stands -> does not stand
|
Size: 5245
Comment: Explain why 'native' might not be confusing to people
|
Deletions are marked like this. | Additions are marked like this. |
Line 66: | Line 66: |
Regardless of the implementation details, we are aware that we will need to pick unambiguous names for our various components. `native` does not stand out as a name that is self-explanatory (native meaning converting to eol format of the platform where the repository was created? of the platform where the file was created? native meaning let the file be?). | Regardless of the implementation details, we are aware that we will need to pick unambiguous names for our various components. For some, `native` does not stand out as a name that is self-explanatory, but it does make sense to those exposed to Subversion's svn:eol property setting which inspired this mechanism in the first place. |
EOL Translation Plan
Status: draft
Some people wish to have end-of-line characters translated into their native form for their operating system. This means CRLF (\r\n, carrige-return followed by line-feed) on Windows and LF (\n) on Unix. Older versions of Mac OS used CR (\r), but Mac OS X and later now use LF.
Problem
The existing solution is the Win32TextExtension, but it has a number of short-comings:
- the settings are not part of the repository and so new developers must set them up the first time. This is the biggest problem.
the names of the filters provided by the extension are unfortunate (cleverencode: and cleverdecode:). Instead of configuring a given file to be a given type, one must "clever encode" in on commit and "clever decode" it on checkout.
- the name of the extension itself indicates that this extension is only for Windows. It is true that this is probably the platform with the weakest tool support for files in non-native EOL format. However, users on other platforms might benefit from a EOL translation extension as well.
Solutions
The solutions generally agree that we need a version controlled file, (say, .hgeol in the repository root) which is used to declare which files the extension should convert. The file could look like this:
[patterns] Windows.txt = CRLF Unix.txt = LF test/mixed.txt = BIN **.txt = native **.py = native **.proj = CRLF
where files with a declared format are always checked out in that format, files declared as "native" are converted to the operating system native format, and files not mentioned (or declared as binary) receive no treatment. The first pattern wins, so more specific patterns should be put first.
Repository format
The following choices have been discussed:
Pure LF
In this solution, all files are converted to LF format upon commit. Both files that are specified as CRLF on checkout and files specified as native are converted.
Subversion works like this with its svn:eol-style property and the hope is that it will simplify the extension.
Mixed format
In this solution, files declared as LF or CRLF format are stored as-is in the repository. Files declared as native are stored in a configurable repository-native format.
The advantage of this solution is that it is minimally invasive. Even people who don't use the extension will get a checkout where LF and CRLF files have the right format. The disadvantage is that this gives the users more chances of configuring things in odd ways.
Update repository on setting changes
When .hgeol is changed, (or possibly when global EOL settings set in .hgrc are changes) the checked-out repository has to be updated to possibly change file EOLs. The expected behavior is similar to hg up 0; hg up. We discussed the addition of an hg eolupdate command, which would re-read the EOL-related settings and update the local files.
Separate content and EOL format
The creation of this new command could be an opportunity to clearly separate content changes, and EOL style changes. We can try to guide our users into not mixing those changes together in a single commit, by having a safe hg eolupdate: in this aspect, the command would fail if the repository has uncommitted changes, to specifically avoid updating EOLs in a "content" changeset.
Symmetrically, it could make sense to have a hg commit aware of the purpose of .hgeol and other eol-related settings: the commit would be aborted if both .hgeol and other files have uncommitted changes.
Unexpected format changes
What should we do if a file has an unexpected format when doing hg commit or hg diff? The easiest solution is to make an encode filter which translates both LF and CRLF into the target format, regardless of what the expected source format is.
Semantics
Regardless of the implementation details, we are aware that we will need to pick unambiguous names for our various components. For some, native does not stand out as a name that is self-explanatory, but it does make sense to those exposed to Subversion's svn:eol property setting which inspired this mechanism in the first place.
A naming policy centered on storage might be more clear to end-users: storeasis, storeaslf is already depicting the behavior on commit, for example. Depending on the implementation, it might be interesting to specify distinctly the behaviors on commit, and on update: "storeasis, getaslf" or "storeaslf, getasis", or "storeascrlf, converttolocal" are too long, but are self-explanatory.
Some suggested that mercurial should not use crlf or lf in our names, and use instead 'windows' and 'unix', respectively. One convention can be chosen, or aliases can be used.
Preferred Solution
...
Implementation
An implementation has been started here: http://bitbucket.org/digitalxero/hg-eol/. It uses a mixed format for the repository and converts all files (with no questions asked) into the target format.
It needs more testing (merges with differences in EOL format, user errors, behavior on Windows).