Status of the "port" of Mercurial to Py3k
This document describes the current status of mercurial's Py3k port. The work here described was developed as part of the Google Summer of Code 2010 program.
1. Summary
Last milestone: "hg manifest" runs successfully (given the manual edits linked below are applied).
Current development: Documentation & Improvement of the fixers to generalize the manual edits
2. Objective and constraints
This project's objective is quite clear: to "port" mercurial to py3k. "Port" is between quotes because this is not a complete port or a rewrite: we want to make mercurial run in py3k while maintaining compatibility with python 2.x. There is an additional constraint, though: mercurial supports python from 2.4, which means the features introduced in 2.6 to ease the porting process can't really be used in the port. Also, refactoring the code to work in both python 2 and 3 proved to be too much work because:
- It would be troublesome to make a multipython code;
- It would be a maintenance hell.
Thus, we came to the conclusion that extending 2to3, the python refactoring tool, was the way to go. So, to summarize the port's objective and constraints:
- We want to make hg run on py3k;
- 2to3 is being used for that;
- We must maintain support for python 2.4 and above;
An important aspect of the approach taken is that we stick to a "from the inside out approach". This means we started working on a port of the core C modules, then to the extension C modules (inotify only, currently), then removing most warnings issued by python 2.6 in "3 mode" (a mode that that issues warnings for deprecated modules and other incompatible changes) to, then, work on the fixing of the code.
2.1. "Design" of the port
Following the suggestion given by mpm in a message to the development list, the approach used in this port consisted in:
- teach 2to3 to change all strings in the source into bytestrings
- fix up the annoying b"A"[0] = 65 behavior
- make the minimum amount of other source changes to get it working under 3.x
The decision pointed out in a) is ok in mercurial's code because "There are basically no Unicode objects 'in the wild' in Mercurial. Their usage is more or less restricted to a couple transcoding function in encoding.py where they can't hurt anybody." 1
Another aspect of the port is related to how strings as bytes and strings as text interact. Currently, the common cases we'll have are
- a) ui.write(repo[rev].user()) # username is transcoded to local encoding b) ui.write(_('abort: can't do that')) # translated and possibly transcoded c) ui.write('debug message') # debug messages aren't translated d) ui.write(repo[rev][file].data()) # raw file data We also have many instances of: e) ui.write('debug message: %s\n' % somerawdata) # cases c and d f) ui.write(_('some message: %s\n') % somerawdata) # cases b and d
And the most tricky ones are cases e) and f). For those, the decision in the port was to convert every string formatting call that could operate on bytes objects to function that could properly handle these types of formatting, since py3k can't format bytes nor mix bytes and unicode. Some examples follow:
$ python3 Python 3.1.2 (r312:79147, Apr 1 2010, 09:12:21) [GCC 4.4.3 20100316 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> "%s" % "foo" 'foo' >>> "%s" % b"foo" "b'foo'" >>> b"%s" % "foo" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for %: 'bytes' and 'str' >>> b"%s" % b"foo" Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'
In particular, case "%s" % b"foo" only worked because b"foo".__repr__() equals "b'foo'". So, to prevent these problems at runtime, we decided to convert the formatting calls to a specialized formatter capable of handling these cases.
3. Status (Milestones)
Port of the core C modules to py3k
Port of inotify's C modules to py3k
Removal of most the warnings issued by python2.6 run with the -3 switch
Implementation of a setup.py-like script that calls 2to3 with our custom fixers
Implementation of a fixer that translates strings into bytestrings
Implementation of a fixer to handle formatting with bytes (b'%s' % 'foo')
- Implementation of a fixer to module name changes not reported by 2.6 (implemented, but not applied)
- Fixing demandimport on py3k
3.1. Source code
Most of the code developed in this project has been already imported into mercurial's official repository. Which means that pulling from it will give you updated code that is known to mostly work. Additionally, you can clone Renato Cunha's patch queue, if you want to test more experimental code and code that hasn't been imported to mercurial yet.
3.2. How to run it
Highly experimental
This page describes a highly experimental feature that isn't ready yet. It is most useful for enthusiasts that want to know the status of the port and/or who are willing to help on it.
From mercurial's source root, you can run:
python3 contrib/setup3k.py build_ext -i build_py -c -d . build_mo
this is equivalent to running "make local" in hg's source root, with the difference that the python3 interpreter will be used and that the python source code will be preprocessed by 2to3 before exiting. This command takes approximately three minutes to run on a five year old Athlon64 3000+.
4. Where to go from now?
bytesformatter improved someday