<> <> = Case-folding Plan = To deal with CaseFolding on the repo side, we need to: * escape uppercase ASCII characters in filenames * escape high ASCII * Unicode and other characters may be case-folded as well * Filesystems and operating systems may do other unfortunate things to . filenames which will cause interoperability trouble * use the same scheme by default on all systems to avoid backup and media sharing issues A simple escaping scheme is as follows: * replace {{{_}}} with {{{__}}} * replace A-Z with _a, etc. * replace characters 126-255 and '\:*?"<>|' with ~7e to ~ff (note this escapes tilde as well Note that we rarely need to Implementation plan: * add separate localrepo access methods for all store data (changelog, manifest, data/*, journal, lock) (./) * if .hg/data exists at localrepo __init__ time, use old access scheme (./) * if not, access all store data with escaped paths inside .hg/store/ (eg .hg/store/00changelog.i or .hg/store/data/_readme.i) (./) This scheme will automatically escape all paths on newly cloned or created repos. On the working directory side, the best we can do is detect collisions. A simple scheme might look something like this: * detect case sensitive filesystem at checkout/update time (./) * scan manifest for case-folding collisions and issue a warning (./) There are some further issues on the working directory and user inteface side. * renames which only change case (e.g. foo -> Foo) will not be properly detected in the filesystem * user supplied filenames may differ in case from the actual file on disk (Question: is it reasonable to require the user to specify the correct case? Probably not, see [[http://www.selenic.com/mercurial/bts/issue646|Issue646]]). Also, filesystems like OSX do Unicode normalisation, meaning that two filenames with differing normal forms may in fact be the same. Finally, there are some filename identity issues even on Unix - the files foo/bar and baz/../foo/bar are the same. These are (presumably) solved on Unix, so looking at how the solution works may offer some advice on how to deal with user input issues. Proposal: * Classify file names into different types: * Manifest internal (case sensitive always) * OS Native (possibly case or normalisation insensitive) * Identify which type of file name is involved in the various API calls * Determine the correct behaviour whenever the 2 types come into contact There are some other differences between manifest internal and os native pathnames (the former always uses / path separators, where the latter uses os.sep) as well as differences between absolute and relative pathnames - in reviewing API calls, these differences should be noted as well. In some cases, this may require carrying round of additional data, to preserve both the user-supplied name, and the actual filesystem canonical name. === See also === * [[http://hgbook.red-bean.com/read/file-names-and-pattern-matching.html#sec:names:case|"Case sensitivity"]] in [[http://hgbook.red-bean.com/hgbook.html|hgbook]] * [[http://www.selenic.com/mercurial/bts/issue839|Issue839]] - "Hg local store creates paths too long for Windows" ---- CategoryWindows CategoryOldFeatures