Differences between revisions 1 and 10 (spanning 9 versions)
Revision 1 as of 2010-08-11 19:45:32
Size: 585
Editor: RenatoCunha
Comment: Initial import
Revision 10 as of 2010-08-12 23:50:24
Size: 6203
Editor: RenatoCunha
Comment: Fixed some typos
Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:
here described was developed as part of the Google Summer of Code program. here described was developed as part of the Google Summer of Code 2010 program.
Line 8: Line 8:
Last milestone: "hg manifest" runs successfully. Last milestone: "hg manifest" runs successfully (given the manual edits linked
below are applied)
.
Line 13: Line 14:
== Design and constraints == == Objective and constraints ==
Line 15: Line 16:
== Current implementation == This project's objective is quite clear: to "port" mercurial to py3k.
"Port" is between quotes because this is not a complete port or a rewrite: we
want to make mercurial run in py3k ''while maintaining compatibility with
python 2.x''. There is an additional constraint, though: mercurial supports
python from 2.4, which means the features introduced in 2.6 to ease the porting
process can't really be used in the port. Also, refactoring the code to work in
both python 2 and 3 proved to be too much work because:

 1. It would be troublesome to make a multipython code;
 2. It would be a maintenance hell.

Thus, we came to the conclusion that extending 2to3, the python refactoring
tool, was the way to go. So, to summarize the port's objective and constraints:

 * We want to make hg run on py3k;
 * 2to3 is being used for that;
 * We must maintain support for python 2.4 and above;

An important aspect of the approach taken is that we stick to a "from the
inside out approach". This means we started working on a port of the core C
modules, then to the extension C modules (inotify only, currently), then
removing most warnings issued by python 2.6 in "3 mode" (a mode that that
issues warnings for deprecated modules and other incompatible changes) to,
then, work on the fixing of the code.

=== "Design" of the port ===

Following the suggestion given by mpm in a
[[http://selenic.com/pipermail/mercurial-devel/2010-June/022363.html|message to the development list]],
the approach used in this port consisted in:

 a. teach 2to3 to change all strings in the source into bytestrings
 a. fix up the annoying b"A"[0] = 65 behavior
 a. make the minimum amount of other source changes to get it working under 3.x

The decision pointed out in a) is ok in mercurial's code because "There are
basically no Unicode objects 'in the wild' in Mercurial. Their usage is more or
less restricted to a couple transcoding function in encoding.py where they
can't hurt anybody." <<FootNote(From http://selenic.com/pipermail/mercurial-devel/2010-June/022255.html)>>

Another aspect of the port is related to how strings as bytes and strings as
text interact. Currently, the common cases we'll have are
 
 a) ui.write(repo[rev].user()) # username is transcoded to local encoding
 b) ui.write(_('abort: can't do that')) # translated and possibly transcoded
 c) ui.write('debug message') # debug messages aren't translated
 d) ui.write(repo[rev][file].data()) # raw file data
 
 We also have many instances of:
 
 e) ui.write('debug message: %s\n' % somerawdata) # cases c and d
 f) ui.write(_('some message: %s\n') % somerawdata) # cases b and d

And the most tricky ones are cases e) and f). For those, the decision in the
port was to convert every string formatting call that could operate on bytes
objects to function that could properly handle these types of formatting, since
py3k can't format bytes nor mix bytes and unicode. Some examples follow:

{{{
$ python3
Python 3.1.2 (r312:79147, Apr 1 2010, 09:12:21)
[GCC 4.4.3 20100316 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "%s" % "foo"
'foo'
>>> "%s" % b"foo"
"b'foo'"
>>> b"%s" % "foo"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'str'
>>> b"%s" % b"foo"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'
}}}

In particular, case "%s" % b"foo" only worked because b"foo"._''''''_repr_''''''_() equals
"b'foo'". So, to prevent these problems at runtime, we decided to convert the
formatting calls to a specialized formatter capable of handling these
cases.

== Status (Milestones) ==

 1. --(Port of the core C modules to py3k)-- (./)
 2. --(Port of inotify's C modules to py3k)-- (./)
 3. --(Removal of most the warnings issued by python2.6 run with the -3 switch)-- (./)
 4. --(Implementation of a setup.py-like script that calls 2to3 with our custom fixers)-- (./)
 5. --(Implementation of a fixer that translates strings into bytestrings)-- (./)
 6. --(Implementation of a fixer to handle formatting with bytes (b'%s' % 'foo'))-- (./)
 7. Implementation of a fixer to module name changes not reported by 2.6 (implemented, but not applied)
 8. Fixing demandimport on py3k
Line 19: Line 111:
=== How to enable the port === Most of the code developed in this project has been already imported into
mercurial's official repository. Which means that pulling from it will give you
updated code that is known to mostly work. Additionally, you can clone
[[http://bitbucket.org/trovao/hg-py3k-patches|Renato Cunha's patch queue]], if
you want to test more experimental code and code that hasn't been imported to
mercurial yet.

=== How to run it ===

{{{#!wiki caution
'''Highly experimental'''

This page describes a highly experimental feature that isn't ready yet. It is
most useful for enthusiasts that want to know the status of the port and/or who
are willing to help on it.
}}}

From mercurial's source root, you can run:

{{{
python3 contrib/setup3k.py build_ext -i build_py -c -d . build_mo
}}}

this is equivalent to running "make local" in hg's source root, with the
difference that the python3 interpreter will be used and that the python source
code will be preprocessed by 2to3 before exiting. This command takes
approximately three minutes to run on a five year old Athlon64 3000+.

== Where to go from now? ==

bytesformatter improved someday

== Notes ==

Status of the "port" of Mercurial to Py3k

This document describes the current status of mercurial's Py3k port. The work here described was developed as part of the Google Summer of Code 2010 program.

1. Summary

Last milestone: "hg manifest" runs successfully (given the manual edits linked below are applied).

Current development: Documentation & Improvement of the fixers to generalize the manual edits

2. Objective and constraints

This project's objective is quite clear: to "port" mercurial to py3k. "Port" is between quotes because this is not a complete port or a rewrite: we want to make mercurial run in py3k while maintaining compatibility with python 2.x. There is an additional constraint, though: mercurial supports python from 2.4, which means the features introduced in 2.6 to ease the porting process can't really be used in the port. Also, refactoring the code to work in both python 2 and 3 proved to be too much work because:

  1. It would be troublesome to make a multipython code;
  2. It would be a maintenance hell.

Thus, we came to the conclusion that extending 2to3, the python refactoring tool, was the way to go. So, to summarize the port's objective and constraints:

  • We want to make hg run on py3k;
  • 2to3 is being used for that;
  • We must maintain support for python 2.4 and above;

An important aspect of the approach taken is that we stick to a "from the inside out approach". This means we started working on a port of the core C modules, then to the extension C modules (inotify only, currently), then removing most warnings issued by python 2.6 in "3 mode" (a mode that that issues warnings for deprecated modules and other incompatible changes) to, then, work on the fixing of the code.

2.1. "Design" of the port

Following the suggestion given by mpm in a message to the development list, the approach used in this port consisted in:

  1. teach 2to3 to change all strings in the source into bytestrings
  2. fix up the annoying b"A"[0] = 65 behavior
  3. make the minimum amount of other source changes to get it working under 3.x

The decision pointed out in a) is ok in mercurial's code because "There are basically no Unicode objects 'in the wild' in Mercurial. Their usage is more or less restricted to a couple transcoding function in encoding.py where they can't hurt anybody." 1

Another aspect of the port is related to how strings as bytes and strings as text interact. Currently, the common cases we'll have are

  • a) ui.write(repo[rev].user()) # username is transcoded to local encoding b) ui.write(_('abort: can't do that')) # translated and possibly transcoded c) ui.write('debug message') # debug messages aren't translated d) ui.write(repo[rev][file].data()) # raw file data We also have many instances of: e) ui.write('debug message: %s\n' % somerawdata) # cases c and d f) ui.write(_('some message: %s\n') % somerawdata) # cases b and d

And the most tricky ones are cases e) and f). For those, the decision in the port was to convert every string formatting call that could operate on bytes objects to function that could properly handle these types of formatting, since py3k can't format bytes nor mix bytes and unicode. Some examples follow:

$ python3
Python 3.1.2 (r312:79147, Apr  1 2010, 09:12:21) 
[GCC 4.4.3 20100316 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> "%s" % "foo"
'foo'
>>> "%s" % b"foo"
"b'foo'"
>>> b"%s" % "foo"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'str'
>>> b"%s" % b"foo"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for %: 'bytes' and 'bytes'

In particular, case "%s" % b"foo" only worked because b"foo".__repr__() equals "b'foo'". So, to prevent these problems at runtime, we decided to convert the formatting calls to a specialized formatter capable of handling these cases.

3. Status (Milestones)

  1. Port of the core C modules to py3k (./)

  2. Port of inotify's C modules to py3k (./)

  3. Removal of most the warnings issued by python2.6 run with the -3 switch (./)

  4. Implementation of a setup.py-like script that calls 2to3 with our custom fixers (./)

  5. Implementation of a fixer that translates strings into bytestrings (./)

  6. Implementation of a fixer to handle formatting with bytes (b'%s' % 'foo') (./)

  7. Implementation of a fixer to module name changes not reported by 2.6 (implemented, but not applied)
  8. Fixing demandimport on py3k

3.1. Source code

Most of the code developed in this project has been already imported into mercurial's official repository. Which means that pulling from it will give you updated code that is known to mostly work. Additionally, you can clone Renato Cunha's patch queue, if you want to test more experimental code and code that hasn't been imported to mercurial yet.

3.2. How to run it

Highly experimental

This page describes a highly experimental feature that isn't ready yet. It is most useful for enthusiasts that want to know the status of the port and/or who are willing to help on it.

From mercurial's source root, you can run:

python3 contrib/setup3k.py build_ext -i build_py -c -d . build_mo

this is equivalent to running "make local" in hg's source root, with the difference that the python3 interpreter will be used and that the python source code will be preprocessed by 2to3 before exiting. This command takes approximately three minutes to run on a five year old Athlon64 3000+.

4. Where to go from now?

bytesformatter improved someday

5. Notes

Py3kPort (last edited 2012-10-25 20:48:22 by mpm)