GSoC Ideas 2010
This page archives ideas collected for GSoC 2010
Project Ideas
Here are a bunch of project ideas you might like to apply for. Of course, if you have a different idea of something in Mercurial that badly needs fixing or some feature you think would make a difference, go ahead and apply with it! Some more project ideas can be found via NewFeatureDiscussions, CategoryNewFeatures and NewIdeas.
Lightweight copies/renames
(very difficult - a successful student will become an expert in Mercurial's storage format and transmission protocol)
Copies and renames currently are not too efficient. Mercurial copies the copied/renamed source file to the new initial revision of the target file in its internal history store. For renames, this is especially counter-intuitive, as renaming a large file grows the store by the file's size. It would be better if Mercurial had some way of referring to the existing revision from the new file, while preserving backwards compatbility and bounded I/O guarantees for retrieving revisions. See issue883 for discussion. There's an mq from an old attempt at this located here.
Contact: mpm, tonfa, cyanite
Parent Delta
(very difficult - a successful student will become an expert in Mercurial's storage format and transmission protocol)
Revlogs such as the manifest that have significant amounts of branching can suffer from revlog's linear delta model. Parent delta would allow storing deltas against parent revisions, greatly improving compression. Like lightweight copies, it will require extending the wire protocol to allow backwards compatibility.
Contact: mpm, tonfa
Better Changeset Discovery
(difficult - a successful student will become an expert in Mercurial's protocol and will experiment with difficult graph theory)
Mercurial currently uses a simple graph discovery protocol to discover what new changesets need to be pushed or pulled without transmitting the entire list available on either side. We suspect that a significantly better protocol exists that has fewer round trips and less data transfer, but research remains.
Some details can by found at DiscoveryPlan, you can also ask parren or tonfa on IRC.
Contact: parren, tonfa
Partial cloning
(difficult - a successful student will become familiar with most of Mercurial's core algorithms)
(existing work in progress - contact Peter Arrenbrecht, parren on irc)
Currently, it's only possible to clone one whole repository at a time. PartialClone and TrimmingHistory could help make cloning more efficient by limiting the cloning process in either of two dimensions: time or space. For time, we could maybe clone the last few changesets and lazily fetch the rest as needed. For space, it would be nice if it was possible to clone just a subtree of any repositories. For these features, any number of thorny issues can arise because of current assumptions in Mercurial code. These are hard projects, but the result will be worth it to many Mercurial users (in terms of developers and in terms of projects).
Contact: parren
Instantaneous status on Windows, OS X
(difficult - a successful student will need to debug complex race conditions and master Mercurial's dirstate algorithms)
The InotifyExtension makes a huge difference to performance on moderate to large repositories on Linux (though it still has some difficult bugs). Windows and Mac OS X provide file status notification APIs, so it should be possible to port the inotify extension to one or other (or both) of these platforms, providing the same kinds of speed improvements as on Linux. (Don't try to do all of these in one project; instead, pick one platform you are comfortable with, read about the relevant APIs, then come up with a coherent proposal.)
Contact: nicdumz
TortoiseHg
TortoiseHg is a GUI front-end, similar to TortoiseCVS and TortoiseSVN. For many people, this makes interacting with Mercurial much easier. There's a lot of room for improvement. An applicant could pick one or more of:
- add graphical UI to interesting extensions like bfiles and others
- add a repository monitoring mechanism, to detect when a GUI application requires a refresh
- add support to Mercurial for paramiko as an ssh agent, hooking password prompts through ui.password
- integrate Meld, which is also written in PyGTK and just recently added support for Windows
- tight integration of hgtk.exe into Visual Studio
- graphical hunk splitting, as discussed in the next task
or their own ideas.
Contact: muggs
Interactive patch selection for commit/Mercurial Queues/record/import
Being able to select parts of the existing changes, with hunk or greater granularity, in an interactive way, can improve the use of commands and extensions that take changes, such as commit, MqExtension(Mercurial Queues) and import. The RecordExtension currently allows patch hunk selection, but sometimes a better granularity is desired, as when a set of adjacent function definitions should go in different commits. This feature could be added as an --interactive mode for many of Mercurial's core commands.
Mercurial on Jython
Some Sun projects have taken to using Mercurial as their VCS of choice (OpenJDK, NetBeans; see ProjectsUsingMercurial for a full list). Recently, Mercurial has re-gained pure-Python implementations of the modules that are now in C. It would be nice if someone could take the effort to get this code to run on Jython, opening up Mercurial for use from within the JDK.
Mercurial on Py3k
(difficult - a successful student will have to master Mercurial's pervasive approach to character set handling and reconcile it with Py3k's)
In 2008, the Python project has finally released their 3.0 branch of Python (affectionately known as Python 3000). This is a major release of Python, with some large changes. Porting to Py3k would be an interesting effort, operating on the current edge of the software, with nice long-term viability results for Mercurial.
GUI integration
Mercurial is currently lacking features to support integration into a GUI, like TortoiseHg or Mercurial Eclipse. One example of a feature that would benefit GUI is a progress indicator on operations that may take some time.
Web interface work
The hgwebdir repository interface is one of the successful parts of Mercurial. It allows projects to quickly setup a web interface with access to multiple repositories, and it has some decent ACL features. It's a Python WSGI application, meaning it's relatively easy to run on many platforms and behind differing web servers. It can still be improved, though: there are still some holes in the WSGI implementation (notably, it writes to stdout/stderr in some error conditions) and the access control features aren't always functional enough for complex setups.
Also, improved support for named branches and filtering (like hg log -f and the like) are often requested features.
Also, improving/updating the django application "FreeHg" would be an option, with:
- server side clone
- access rights, with users/groups
Contact: djc
Conversion tools
Mercurial is a relatively new entrant in the VCS market, and many projects are still using older VCSs such as CVS and SVN. While we currently have some tools to help migrate to Mercurial in the form of the ConvertExtension, these tools could certainly use more improvements. Specifically, enabling the use of Mercurial as a client for SVN or even git and/or Bazaar-NG repositories would be very nice, as it enables developers to make their own choice regarding the use of their VCS client, thereby drastically enlarging our userbase.
The HgSubversion extension already works quite well, but better integration with the existing Mercurial user interface would be nice.
Contact: durin42, pmezard
Performance tuning
(difficult - a successful student will master performance benchmarking, tuning Python algorithms, and possibly writing C extensions in Python)
There are numerous opportunities for additional performance tuning. Possibilities include:
- faster startup (can we optimize Python load time further? can we minimize loading of unused code further?)
- faster status (can we further improve directory walking? ignore logic? dirstate parsing?)
- faster checkout (can we reduce the overhead of our file I/O paths?)
A project in this area would probably involve picking an important benchmark (eg checkout of a large repository) and tuning multiple areas to meet a performance goal (eg 2x overall performance increase).