Differences between revisions 13 and 14
Revision 13 as of 2009-07-30 17:41:37
Size: 3258
Editor: BradOlson
Comment:
Revision 14 as of 2010-08-06 18:01:22
Size: 3279
Editor: JohnHein
Comment: fix hg book link
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
  * Unicode and other characters may be case-folded as well
  * Filesystems and operating systems may do other unfortunate things to
     filenames which will cause interoperability trouble
  * Unicode and other characters may be case-folded as well
  * Filesystems and operating systems may do other unfortunate things to
   . filenames which will cause interoperability trouble
Line 16: Line 16:
Note that we rarely need to  Note that we rarely need to
Line 53: Line 53:
 * [[http://hgbook.red-bean.com/read/file-names-and-pattern-matching.html#sec:names:case|"Case sensitivity"]] in [[http://hgbook.red-bean.com/hgbook.html|hgbook]]
 * [[http://www.selenic.com/mercurial/bts/issue839|Issue839]] - "Hg local store creates paths too long for Windows"
 * CaseFoldPlugin
Line 54: Line 57:
 * [[http://hgbook.red-bean.com/hgbookch7.html#x11-1530007.7|"Case sensitivity"]] in [[http://hgbook.red-bean.com/hgbook.html|hgbook]]
 * [[http://www.selenic.com/mercurial/bts/issue839|Issue839]] - "Hg local store creates paths too long for Windows"
 * [[CaseFoldPlugin]]

To deal with CaseFolding on the repo side, we need to:

  • escape uppercase ASCII characters in filenames
  • escape high ASCII
    • Unicode and other characters may be case-folded as well
    • Filesystems and operating systems may do other unfortunate things to
      • filenames which will cause interoperability trouble
  • use the same scheme by default on all systems to avoid backup and media sharing issues

A simple escaping scheme is as follows:

  • replace _ with __

  • replace A-Z with _a, etc.
  • replace characters 126-255 and '\:*?"<>|' with ~7e to ~ff (note this escapes tilde as well

Note that we rarely need to

Implementation plan:

  • add separate localrepo access methods for all store data (changelog, manifest, data/*, journal, lock) (./)

  • if .hg/data exists at localrepo init time, use old access scheme (./)

  • if not, access all store data with escaped paths inside .hg/store/ (eg .hg/store/00changelog.i or .hg/store/data/_readme.i) (./)

This scheme will automatically escape all paths on newly cloned or created repos.

On the working directory side, the best we can do is detect collisions. A simple scheme might look something like this:

  • detect case sensitive filesystem at checkout/update time (./)

  • scan manifest for case-folding collisions and issue a warning (./)

There are some further issues on the working directory and user inteface side.

  • renames which only change case (e.g. foo -> Foo) will not be properly detected in the filesystem

  • user supplied filenames may differ in case from the actual file on disk (Question: is it reasonable to require the user to specify the correct case? Probably not, see Issue646).

Also, filesystems like OSX do Unicode normalisation, meaning that two filenames with differing normal forms may in fact be the same.

Finally, there are some filename identity issues even on Unix - the files foo/bar and baz/../foo/bar are the same. These are (presumably) solved on Unix, so looking at how the solution works may offer some advice on how to deal with user input issues.

Proposal:

  • Classify file names into different types:
    • Manifest internal (case sensitive always)
    • OS Native (possibly case or normalisation insensitive)
  • Identify which type of file name is involved in the various API calls
  • Determine the correct behaviour whenever the 2 types come into contact

There are some other differences between manifest internal and os native pathnames (the former always uses / path separators, where the latter uses os.sep) as well as differences between absolute and relative pathnames - in reviewing API calls, these differences should be noted as well.

In some cases, this may require carrying round of additional data, to preserve both the user-supplied name, and the actual filesystem canonical name.

See also


CategoryWindows CategoryNewFeatures

CaseFoldingPlan (last edited 2012-11-06 23:04:58 by abuehl)