Size: 1662
Comment:
|
Size: 3229
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 31: | Line 31: |
There are some further issues on the working directory and user inteface side. * renames which only change case (e.g. foo -> Foo) will not be properly detected in the filesystem * user supplied filenames may differ in case from the actual file on disk (Question: is it reasonable to require the user to specify the correct case? Probably not, see [http://www.selenic.com/mercurial/bts/issue646 Issue646]). Also, filesystems like OSX do Unicode normalisation, meaning that two filenames with differing normal forms may in fact be the same. Finally, there are some filename identity issues even on Unix - the files foo/bar and baz/../foo/bar are the same. These are (presumably) solved on Unix, so looking at how the solution works may offer some advice on how to deal with user input issues. Proposal: * Classify file names into different types: * Manifest internal (case sensitive always) * OS Native (possibly case or normalisation insensitive) * Identify which type of file name is involved in the various API calls * Determine the correct behaviour whenever the 2 types come into contact There are some other differences between manifest internal and os native pathnames (the former always uses / path separators, where the latter uses os.sep) as well as differences between absolute and relative pathnames - in reviewing API calls, these differences should be noted as well. In some cases, this may require carrying round of additional data, to preserve both the user-supplied name, and the actual filesystem canonical name. |
To deal with CaseFolding on the repo side, we need to:
- escape uppercase ASCII characters in filenames
- escape high ASCII
- Unicode and other characters may be case-folded as well
- Filesystems and operating systems may do other unfortunate things to
- filenames which will cause interoperability trouble
- use the same scheme by default on all systems to avoid backup and media sharing issues
A simple escaping scheme is as follows:
replace _ with __
- replace A-Z with _a, etc.
replace characters 126-255 and '\:*?"<>|' with ~7e to ~ff (note this escapes tilde as well
Note that we rarely need to
Implementation plan:
add separate localrepo access methods for all store data (changelog, manifest, data/*, journal, lock)
if .hg/data exists at localrepo init time, use old access scheme
if not, access all store data with escaped paths inside .hg/store/ (eg .hg/store/00changelog.i or .hg/store/data/_readme.i)
This scheme will automatically escape all paths on newly cloned or created repos.
On the working directory side, the best we can do is detect collisions. A simple scheme might look something like this:
detect case sensitive filesystem at checkout/update time
scan manifest for case-folding collisions and issue a warning
There are some further issues on the working directory and user inteface side.
renames which only change case (e.g. foo -> Foo) will not be properly detected in the filesystem
user supplied filenames may differ in case from the actual file on disk (Question: is it reasonable to require the user to specify the correct case? Probably not, see [http://www.selenic.com/mercurial/bts/issue646 Issue646]).
Also, filesystems like OSX do Unicode normalisation, meaning that two filenames with differing normal forms may in fact be the same.
Finally, there are some filename identity issues even on Unix - the files foo/bar and baz/../foo/bar are the same. These are (presumably) solved on Unix, so looking at how the solution works may offer some advice on how to deal with user input issues.
Proposal:
- Classify file names into different types:
- Manifest internal (case sensitive always)
- OS Native (possibly case or normalisation insensitive)
- Identify which type of file name is involved in the various API calls
- Determine the correct behaviour whenever the 2 types come into contact
There are some other differences between manifest internal and os native pathnames (the former always uses / path separators, where the latter uses os.sep) as well as differences between absolute and relative pathnames - in reviewing API calls, these differences should be noted as well.
In some cases, this may require carrying round of additional data, to preserve both the user-supplied name, and the actual filesystem canonical name.
See also
[http://hgbook.red-bean.com/hgbookch7.html#x11-1530007.7 "Case sensitivity"] in [http://hgbook.red-bean.com/hgbook.html hgbook]
[http://www.selenic.com/mercurial/bts/issue839 Issue839] - "Hg local store creates paths too long for Windows"