Differences between revisions 2 and 12 (spanning 10 versions)
Revision 2 as of 2010-04-09 19:13:02
Size: 13467
Editor: RenatoCunha
Comment: Added some info about me and my application's text
Revision 12 as of 2010-06-25 01:51:23
Size: 3192
Editor: RenatoCunha
Comment:
Deletions are marked like this. Additions are marked like this.
Line 5: Line 5:
Line 7: Line 6:
Line 17: Line 15:

Currently pursuing a master's degree in Computer Science. Have been working with python for about four years, mostly in hobby projects. Have become a mercurial user since the end of 2009.
Currently pursuing a master's degree in Computer Science. Have been working with python for about four years, mostly in hobby projects. Have become a mercurial user at the end of 2009.
Line 23: Line 20:
Line 26: Line 22:
Here is the version I submitted: === Implementation status ===
First of all, my patch queue is located at http://bitbucket.org/trovao/hg-py3k-patches and its corresponding feed is located at http://bitbucket.org/trovao/hg-py3k-patches/rss.
Line 28: Line 25:
{{{#!rst
===============================
 Porting Mercurial to Python 3
===============================
As of 2010-06-08 I've ported the core C modules to python 3. The decisions taken when porting it were based on the idea that hg operates on bytes, and, thus, the new modules operate on and return bytes objects.
Line 33: Line 27:
:author: Renato Cunha === Journal (status updates - dates in ISO format) ===
Line 35: Line 29:
.. topic:: Abstract

    Mercurial is growing in popularity with many software developers. Knowing that
    the future of Python development is on the python 3 branch, and that the
    adoption of Python 3.x depends on having available tools and libraries for it,
    this project proposal intends to deal with a mercurial port so that it can run
    on Python 3.x.


Outline of this proposal
========================

As the project described in this proposed is not trivial by any means, this
section provides an outline of what I consider the main parts of the work
needed. The following sections will, then, describe the listed activities and
the time expected for finishing the work.

    0. Deliverables
    1. Studying relevant material for the porting work
    2. Diving into mercurial's code
    3. Python code porting approaches
    4. Porting the core python code
    5. Porting C extension modules
    6. Porting the mercurial extensions
    7. Conclusion


Deliverables
============

As aforementioned, the objective of the project described in this proposal is
to implement a Python3 compatible mercurial port.

Ideally, the full range of mercurial commands/operations will be fully ported,
along with official extensions shipped with mercurial. This ideal might not be
reachable within the three months of work dedicated to work on it, even though
it looks like a plausible task today.

That being said, a project that doesn't fully port mercurial to python 3
shouldn't be considered a failure. It is my belief that the porting process can
be gradual, and that once the most basic commands/operations have been ported,
the others will become more viable. Unfortunately I can't tell in advance what
should be the set of ported features to consider the outcome of this project a
success, but I believe that with a few interactions with the mentoring
organization can shed some light in this issue.


Studying relevant material for the porting work
===============================================

The Python community has produced various documents describing the changes
between python 2.x and python 3.x [#]_, [#]_, a tool, 2to3 [#]_, to help on
porting the "trivial" changes, documents outlining the porting process for
general python projects [#]_ [#]_ (and extension modules [#]_), reports describing the
hard-earned lessons on porting specific projects [#]_ [#]_ and projects demonstrating the
incompatibilities between python 2 and 3 [#]_.

Given that there are so many resources available for guiding a porting work, I
would, naturally, begin by reviewing them so I can be fully aware of what will
be expecting me. The insights acquired from this may even help me identify
unforeseen problems in this proposal and fix them before the program actually
starts.

The steps described here might take place in the interim period, between April
9 and April 21.


References for this section
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. [#] http://www.python.org/dev/peps/pep-3100/
.. [#] http://www.python.org/dev/peps/pep-3099/
.. [#] http://docs.python.org/library/2to3.html
.. [#] http://wiki.python.org/moin/PortingPythonToPy3k
.. [#] http://diveintopython3.org/porting-code-to-python-3-with-2to3.html
.. [#] http://wiki.python.org/moin/PortingExtensionModulesToPy3k
.. [#] http://lucumr.pocoo.org/2010/2/11/porting-to-python-3-a-guide
.. [#] http://wiki.python.org/moin/PortingDjangoTo3k
.. [#] http://code.google.com/p/python-incompatibility/

Diving into mercurial's code
============================

Mercurial is composed of more than 200 python modules, some C extension
modules, plus mercurial extensions, test suites, and documentation. This
amounts to approximately 43000 source lines of code (according to cloc [#]_).

Of the 66 python modules in the mercurial package directory, only 8 (of which
two are package bookkeeping modules - __init__.py & __version__.py) are
successfully parsed by python3. Even after a call to 2to3, only 10 modules are
parsed successfully by python3, which reinforces the non-triviality of this
project.

Knowing that mercurial makes heavy use of both strings as byte sequences and as
text, I'd use the knowledge obtained in the task described in the previous
section to identify the main candidates of causing porting problems while
studying mercurial's source code.

I'd probably need to talk to developers or dig the wiki to know which are the
most important parts of the program, so I can have a basic working
implementation as soon as possible.

.. Note:: "Basic" is a kind of fuzzy definition. But I think this work should
    be done in parts. So, if I can come up with a port that initially only
    creates an empty repository, I'd be happy with it. Then I would
    incrementally improve it until it is able to work with the full range of
    mercurial operations.


Time estimates
~~~~~~~~~~~~~~

This part will probably take place in the "Community Bonding Period" described
in GSoC's timeline page [#]_ and probably will span throughout all the
implementation period, as each module's code nuances might be discovered while
I'll be working on them.


References for this section
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. [#] http://cloc.sourceforge.net
.. [#] http://socghop.appspot.com/document/show/gsoc_program/google/gsoc2010/timeline


Python code porting approaches
==============================

There are a few approaches that can be taken while working on this project.
They are, as described in the Python wiki (all of them involves working in a
separately cloned mercurial development repository):

    1. Implementing a Python 3-only version.
    2. Making a 2.6 (2.7) and 3.x compatible version;
    3. Implementing an abstraction layer that maintains mercurial compatible to
       all the currently supported versions, plus 3.x.

According to the information in the python wiki [#]_, since code written for
python 2.6 (and 2.7) can be made forward-compatible with python 3, options one
and two can be merged. Should that option be selected by the mentors, I'd
probably use the method outlined in PEP3000 [#]_, which is:

    1. Port mercurial to Python 2.6. (Already done)
    2. Turn on the Py3k warnings mode.
    3. Test and edit until no warnings remain.
    4. Use the 2to3 tool to convert this source code to 3.x syntax.
    5. Test the converted source code under 3.x.
    6. If problems are found, make corrections to the 2.6 version of the source
       code and go back to step 3.
    7. When it's time to release, release separate 2.6 and 3.x tarballs (or
       use distutils' 2to3 integration).

For the testing part, the mercurial test suite would be an invaluable tool to
verify that here were no regressions. Logging all the output generated by
mercurial then would, then generate the warnings that will most likely need to
be fixed.

As describe in porting to py3k reports, 2to3 makes some semantically incorrect
changes, and for each module, I'd try to isolate this code from 2to3.

From a user point of view, option number three is the most attractive, since it
would support the widest range of users. Like one would expect, it is also the
hardest to implement, and I'd prefer to discard it as an option. But, as an
exercise, I'll try to describe the work involved in this.

Since there is no support in python 2.5 and earlier from most of the changes
introduced in py3k (which 2.6 is, at least, aware of), some calls in the
current mercurial code base would need to be translated to another calls (like
converting the use of the print statement to calls to sys.stdout.write) or an
abstraction layer would be implemented. In some cases, both approaches should
be needed. Like in the case of python 2.x expecting str objects for some
operations, while python 3.x would require unicode objects.

Some sources of inspiration for this approach can be found in the repository of
the django py3k port [#]_.


References for this section
~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. [#] http://wiki.python.org/moin/PortingPythonToPy3k
.. [#] http://www.python.org/dev/peps/pep-3000/
.. [#] http://bitbucket.org/loewis/django-3k/src/
 

Porting the core python modules
===============================

Porting the core python modules will be the most work-intensive part of this
project, since this is where most of the code lies. As described in the
previous section, one porting approach needs to be chosen before work starts
on this task. As already discussed, I'd prioritize working first on the most
used modules to have a basic working port as soon as possible. For that, I'd
need to analyze the code and get some insights with the core mercurial
developers.

After finishing this basic port (like making a version capable of init'ing a
repository), I'd start work on the other modules (in the same prioritized
scheme) to bring the rest of the functionality to the py3k port.


Time estimates
~~~~~~~~~~~~~~

Given the size of mercurial's code, this part can take an arbitrary amount of
work as was discussed in the "Deliverables" section.

Given the limited amount of time to work on GSoC, I'd fixate an upper limit of
two months and a half to work in this part. Should the complete port of the
core be complete before this time is due, I'd start working in the extensions
shipped with mercurial.


Porting the C extension modules
===============================

Currently, mercurial uses a few (six, in the core, plus one extension, inotify)
C extension modules, written mostly with performance in mind.

Considering that this code is small, two approaches can be taken:

    1. Porting the python interfaces to python 3 using conditional compilation
       to separate API calls from incompatible versions;

    2. Implementing python-only modules to substitute the C versions in the
       py3k port.

Though option 2 is interesting from the point of view that it could be usable
for eventual mercurial ports to other pythons [#]_ [#]_, I'd probably try to
adapt the existing C code to py3k according to the python documentation [#]_.
Unless the mentoring organization prefers option 2.


Time estimates
~~~~~~~~~~~~~~

This task overlaps with the task of porting the python modules to py3k. Given
that the C modules implement basic patching and diff operations, this task
would have to be completed as soon as someone wants diff/patch support in the
py3k port.

In an optimistic scenario, I believe one week should be enough to port and test
this part in the Linux, Windows and Mac OS X Operating systems.

.. Note:: I'd probably have a bit of trouble in working with Windows, since it
    has been quite some time since I last developed for it. Should I have
    problems with that platform, it is likely that this activity will take more
    than one week. Possibly two.

References for this section
~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. [#] http://www.jython.org/
.. [#] http://ironpython.net/
.. [#] http://docs.python.org/py3k/howto/cporting.html


Porting the mercurial extensions
================================

The core mercurial extensions are written in pure python. Even though they
extend mercurial's behavior in many interesting ways, they aren't required for
it to work properly, and, thus, the porting of this part of mercurial would
receive the least priority in my project.

Time allowing for work on the extensions, I'd use the very same approach, the
possible approaches were discussed in the "Python code porting approaches"
section, used to port mercurial's core.


Time estimates
~~~~~~~~~~~~~~

The time allocated to this task would be the total time to finish the GSoC
project minus the time needed to finish the porting of mercurial's core (if it
is positive or zero).


Conclusion
==========

This proposal outlined the approach I'd take to port mercurial to run on py3k
while presenting the major tasks involved in said project, some of the possible
approaches that one could take and a brief discussion of what should be
delivered by the end of this project.

It is my belief that this project would be beneficial for both the mercurial
and the python communities, since it would enable more users to try development
with py3k and because it would also prepare mercurial for python's future.
}}}
 * 2010-06-14: Using 2to3 and some hand-editing, I managed to get "hg version" running in py3k. Need to recheck the generated patch and find a strategy to solve those trouble spots.
 * 2010-06-15: Attended to GSoC meeting @ #mercurial; submitted some documentation improvements to py3k (http://bugs.python.org/issue9001 & http://bugs.python.org/issue9002).
 * 2010-06-16: Worked on inotify extension py3k port. Figured out I may be developing RSI. :/
 * 2010-06-17: Continued working on the inotify extension.
 * 2010-06-18: Finished work on itotify's C extension and shelved it until 1.6 is released. Took a look at other projects' approaches in py3k porting (sqlalchemy & django)
 * 2010-06-19: -- Weekend (even though I ''tried'' to work, it didn't quite work as expected) --
 * 2010-06-20: -- Weekend (haven't even tried) --
----
 * 2010-06-21: Wondered on how to help solving bugs for the upcoming 1.6 release. Been distracted all day trying to focus on writing a document to describe my approach to handle the "bytes vs. unicode" problem.
 * 2010-06-22: Attended to GSoC meeting @ #mercurial;
 * 2010-06-23: Implemented fixes to some py3k incompatibilities. Namely: tuple argument unpacking in [[http://bitbucket.org/trovao/hg-py3k-patches/src/tip/churn-tuple-unpack.diff|churn.py]] and [[http://bitbucket.org/trovao/hg-py3k-patches/src/tip/convert-tuple-unpack.diff|convert.py]], removed the [[http://bitbucket.org/trovao/hg-py3k-patches/src/tip/record-no-reduce.diff|usage of the reduce function in the record extension]], specifically defined an [[http://bitbucket.org/trovao/hg-py3k-patches/src/tip/revlog-int-division.diff|int division as such in revlog.py]]. More to come...
 * 2010-06-24: Reviewed [[http://mercurial.selenic.com/bts/issue2130|issue 2130]] (I suppose it needs crew intervention) & helped some people at #mercurial. Got hg version to run without the need of HGPLAIN.

Renato Cunha

Contact

Blog: http://valedotrovao.com

Email: <renato AT SPAMFREE renatocunha DOT com>

Homepage: http://renatocunha.com

IRC: trovao @ irc.freenode.net

About me

Currently pursuing a master's degree in Computer Science. Have been working with python for about four years, mostly in hobby projects. Have become a mercurial user at the end of 2009.

I also have some experience with Open Source software development, gained by contributing code and documentation patches to some projects I use and by working on dropline GNOME.

Idea for GSoC 2010

My application's text is located at http://bitbucket.org/trovao/gsoc-2010/src/tip/application.rst.

Implementation status

First of all, my patch queue is located at http://bitbucket.org/trovao/hg-py3k-patches and its corresponding feed is located at http://bitbucket.org/trovao/hg-py3k-patches/rss.

As of 2010-06-08 I've ported the core C modules to python 3. The decisions taken when porting it were based on the idea that hg operates on bytes, and, thus, the new modules operate on and return bytes objects.

Journal (status updates - dates in ISO format)

  • 2010-06-14: Using 2to3 and some hand-editing, I managed to get "hg version" running in py3k. Need to recheck the generated patch and find a strategy to solve those trouble spots.
  • 2010-06-15: Attended to GSoC meeting @ #mercurial; submitted some documentation improvements to py3k (http://bugs.python.org/issue9001 & http://bugs.python.org/issue9002).

  • 2010-06-16: Worked on inotify extension py3k port. Figured out I may be developing RSI. :/
  • 2010-06-17: Continued working on the inotify extension.
  • 2010-06-18: Finished work on itotify's C extension and shelved it until 1.6 is released. Took a look at other projects' approaches in py3k porting (sqlalchemy & django)

  • 2010-06-19: -- Weekend (even though I tried to work, it didn't quite work as expected) --

  • 2010-06-20: -- Weekend (haven't even tried) --


  • 2010-06-21: Wondered on how to help solving bugs for the upcoming 1.6 release. Been distracted all day trying to focus on writing a document to describe my approach to handle the "bytes vs. unicode" problem.
  • 2010-06-22: Attended to GSoC meeting @ #mercurial;
  • 2010-06-23: Implemented fixes to some py3k incompatibilities. Namely: tuple argument unpacking in churn.py and convert.py, removed the usage of the reduce function in the record extension, specifically defined an int division as such in revlog.py. More to come...

  • 2010-06-24: Reviewed issue 2130 (I suppose it needs crew intervention) & helped some people at #mercurial. Got hg version to run without the need of HGPLAIN.


CategoryHomepage

RenatoCunha (last edited 2010-10-22 18:16:55 by mpm)