<> <> <> (part of InternationalizationPlan) To allow for interoperability between users with different charset encodings, Mercurial will transcode certain elements of the data it manages to UTF-8. Mercurial intentionally makes no assumptions about the charset of any data it manages except the elements described below. == Elements that need to be transcoded == * Usernames * Commit messages * Tags * Branch names == Files and encodings == * Changelog - UTF-8 (globally distributed) * .hgtags - UTF-8 (globally distributed, mostly managed by hg) * .hgrc - local (locally managed) * .hg/localtags - local (locally managed) * .hg/branch - local (no special reason) * .hg/branchcache - UTF-8 (otherwise, we'd need to invalidate when we changed locales) == Things that need to be done == * add local encoding detection in util, with environment override (./) * add transcoding functions to util (./) * tolocal - decode stored data from UTF-8 robustly, falling back to latin-1 on failure * fromlocal - transcode local strings to UTF-8 with "strict" by default * transcode usernames and commit messages (./) * transcode tags (./) * transcode branch data (./) * properly report encoding in hgweb (see [[hgweb encoding]]) (./) * add --encoding and --encodingmode global options (./) * add a test (./) == Legacy repositories == Legacy repositories may contain non-UTF-8 data as UTF-8 wasn't enforced. To continue to operate robustly, we do the following: * attempt to decode with UTF-8, strict * attempt to decode with Latin-1, strict * attempt to decode with UTF-8, replacing unknown characters == Windows and OS X charset weirdness == See CharacterEncodingOnWindows for a discussion of dealing with Windows charset braindamage and [[Character_Encoding_On_OSX]] for a similar form of braindamage on OS X. ----