Google Summer of Code 2009

For the last few years, Google has offered a fantastic opportunity for students to help out Open Source software projects in the summer while getting paid for it. It's called Google Summer of Code™, and it provides free software projects a great way of attracting development effort while providing software developers who are still in university with some interesting and useful experiences. Find out more about the Summer of Code (SoC) from [http://code.google.com/summerofcode.html their site].

Mercurial is a Distributed Version Control System (DVCS), and we would like to participate in 2009's Summer of Code. In recent years (notably since the Linux kernel project started looking for a new VCS), DVCSs have rapidly started to gain traction. For open source projects, especially, the use of a distributed system can be a great enabler, in that it's much easier to keep track of your own changes and publish them back to the community in a structured way. With centralized systems, you have a central repository that only a happy few have (write) access to, and people spend a lot of time mailing around diffs. With distributed systems, everyone has their own branch, and it's very easy to publish it on the web, compare the changes to those on some other branch or merge two branches back together.

We believe distributed version control is the future, and it looks like many other technologists believe the same. There's some heavy competition between open source VCS systems: git, Bazaar-NG and Mercurial are constantly fighting over users, while many more conservative projects are still on Subversion or even CVS. Competition is fierce, stimulating fast-paced development in the three DVCS contenders. It's therefore important (and good for the competition) that projects like ours gain more developers, to keep up with the race for robustness, performance and features. We view the Summer of Code as a great opportunity to help us in this competition.

Written largely in Python (with the exceptions of a few performance-critical core modules), it is easy to develop for and allows for good code organization and clean abstractions. Moreover, it has been optimized from the start for good performance, using an append-only datastore, minimizing random disk access and smart uses of compression. Finally, it has extensive features for sharing changesets, over HTTP, SSH and in files over email and has an innovative extension (Mercurial Queues) to help building and organizing changesets into a thoroughly useful source history.

TableOfContents

1. Notes on applying

Here are some tips on what you might want to include in your application.

2. Getting things done

Some tips on how you can get to project completion within the scheduled timeline.

3. Project Ideas

Here are a bunch of project ideas you might like to apply for. Of course, if you have a different idea of something in Mercurial that badly needs fixing or some feature you think would make a difference, go ahead and apply with it! Some more project ideas can be found via NewFeatureDiscussions, CategoryNewFeatures and NewIdeas.

3.1. Lightweight copies/renames

Copies and renames currently are not too efficient. Mercurial copies the copied/renamed source file to the new initial revision of the target file in its internal history store. For renames, this is especially counter-intuitive, as renaming a large file grows the store by the file's size. It would be better if Mercurial had some way of referring to the existing revision from the new file, while preserving backwards compatbility and bounded I/O guarantees for retrieving revisions. See [http://www.selenic.com/mercurial/bts/issue883 issue883] for discussion.

3.2. Partial cloning

Currently, it's only possible to clone one whole repository at a time. PartialClone and TrimmingHistory could help make cloning more efficient by limiting the cloning process in either of two dimensions: time or space. For time, we could maybe clone the last few [:ChangeSet:changesets] and lazily fetch the rest as needed. For space, it would be nice if it was possible to clone just a subtree of any repositories. For these features, any number of thorny issues can arise because of current assumptions in Mercurial code. These are hard projects, but the result will be worth it to many Mercurial users (in terms of developers and in terms of projects).

3.3. Subrepositories

The ForestExtension implements a solution for repositories that want to include a number of subrepositories (similar to svn:externals). For large projects and because DVCS systems in general advocate smaller repositories, it can be helpful to implement a coherent set of repositories. A [:NestedRepositories:subrepositories implementation] is in the works but it needs further polish, integration into the Mercurial core, as well as extended functionality, as the capability to use other VCS sources.

3.4. Instantaneous status on Linux, Windows, OS X

The InotifyExtension makes a huge difference to performance on moderate to large [:Repository:repositories] on Linux, but it still has some difficult bugs. Windows and Mac OS X provide file status notification APIs, so it should be possible to port the inotify extension to one or other (or both) of these platforms, providing the same kinds of speed improvements as on Linux. (Don't try to do all of these in one project; instead, pick one platform you are comfortable with, read about the relevant APIs, then come up with a coherent proposal.)

3.5. TortoiseHg

TortoiseHg is a GUI front-end, similar to TortoiseCVS and TortoiseSVN. For many people, this makes interacting with Mercurial much easier. There's a lot of room for improvement, adding graphical UI to interesting extensions like pbranch and others. The project keeps a [http://bitbucket.org/tortoisehg/crew/wiki/TODO TODO list] of suitable tasks for a Summer of Code.

3.6. Interactive patch selection for commit/Mercurial Queues/record/import

Being able to select parts of the existing changes, with hunk or greater granularity, in an interactive way, can improve the use of commands and extensions that take changes, such as commit, MercurialQueues and import. The RecordExtension currently allows patch hunk selection, but sometimes a better granularity is desired, as when a set of adjacent function definitions should go in different commits. This feature could be added as an --interactive mode for many of Mercurial's core commands.

3.7. Mercurial on Jython

Some Sun projects have taken to using Mercurial as their VCS of choice (OpenJDK, NetBeans; see ProjectsUsingMercurial for a full list). Recently, Mercurial has re-gained pure-Python implementations of the modules that are now in C. It would be nice if someone could take the effort to get this code to run on Jython, opening up Mercurial for use from within the JDK.

3.8. Mercurial on Py3k

In 2008, the Python project has finally released their 3.0 branch of Python (affectionately known as Python 3000). This is a major release of Python, with some large changes. Porting to Py3k would be an interesting effort, operating on the current edge of the software, with nice long-term viability results for Mercurial.

3.9. GUI integration

Mercurial is currently lacking features to support integration into a GUI, like TortoiseHg or Mercurial Eclipse. One example of a feature that would benefit GUI is [http://www.nabble.com/Progress-bar-on-network-actions-tt20596124.html#a20596124 a progress indicator] on operations that may take some time.

3.10. Web interface work

The hgwebdir repository interface is one of the successful parts of Mercurial. It allows projects to quickly setup a web interface with access to multiple repositories, and it has some decent ACL features. It's a Python WSGI application, meaning it's relatively easy to run on many platforms and behind differing web servers. It can still be improved, though: there are still some holes in the WSGI implementation (notably, it writes to stdout/stderr in some error conditions) and the access control features aren't always functional enough for complex setups.

3.11. Conversion tools

Mercurial is a relatively new entrant in the VCS market, and many projects are still using older VCSs such as CVS and SVN. While we currently have some tools to help migrate to Mercurial in the form of the ConvertExtension, these tools could certainly use more improvements. Specifically, enabling the use of Mercurial as a client for SVN or even git and/or Bazaar-NG repositories would be very nice, as it enables developers to make their own choice regarding the use of their VCS client, thereby drastically enlarging our userbase.

The hgsubversion extension already works quite well, but better integration with the existing Mercurial user interface would be nice.

3.12. C version of some lower-level operations

It should be possible to extract a large performance improvement from writing a C version of some basic operations. The main candidate for this would be the code that handles the index of a [:RevlogNG:revlog]; other parts of the revlog implementation (like the heads operation that is used in the WireProtocol and when calculating tags) could also benefit quite a bit.

4. Mentors

A list of potential mentors.

5. Students Considering Application

6. Applications Received From

7. Project status


CategoryNewFeatures

"Google Summer of Code" is a trademark by Google Inc.