Note:
This page is primarily intended for developers of Mercurial.
This page does not meet our wiki style guidelines. Please help improve this page by cleaning up its formatting. |
Note:
This page appears to contain material that is no longer relevant. Please help improve this page by updating its content.
As of 1.2, renames copy a full file revision in the history. For large binary files, this is expensive. It would be desirable to store a pointer plus a delta.
This is tracked in issue883.
This creates several issues:
- old versions of Mercurial won't understand the new format
- new versions will need to look at multiple revlogs to calculate a revision hash
So a fix requires:
- adding a repository layout change flag to the requires file
- restructuring hash calculation to cross revlogs
- adding code to allow renames to use pointers
- extending the wire protocol to allow compatibility with old clients
- generating deltas compatible with old clients when required
Currently, revlogs are (by design) insulated from other revlogs. Knowledge of how and where other revlogs are stored is only available at a higher level (filelog). A revlog is a black box which higher level code hands a revision number and receives a complete revision in return, with its hash checked.
To remain backwards-compatible with clients over the wire, we must violate this abstraction. Here's a proposed approach:
- In filelog, override revlog.revision by adding metadata that says "the revision returned by revlog is not a full revision as promised but a revision of file x@rev + the body here treated as a delta."
- Then filelog.revision can instantiate a temporary filelog object for x, get the specified revision, and apply the delta.
- Add the appropriate steps in filelog.add to make this work.
- Now with a little luck, getting the -next- revision from the filelog will just work. Otherwise, we'll need to hack revlog.revision to call itself (and thereby filelog.revision) to grab the base revision.
So now we've got a scheme that mostly does away with the layering violations as revlog doesn't have to have any special knowledge about other revlogs (it's all in the filelog class, which already knows how to find and open revlog from a pathname). It even gets the case where c@z is a copy of b@y which is a copy of a@x right automatically.
But we've also got a huge compatibility problem. An old client can't just pull this data and expect it to work. Instead, we've got to add a new version of the wire protocol that allows us to send these sorts of deltas to new clients, but sends full revisions to old clients using the old wire protocol. And a new client would like to take old client data and deltify the copies, which may not be possible at pull time (for instance, if the destination revlog is sent before the source revlog).
Discussion
PeterArrenbrecht notes that it would be good if the new wire protocol would also support sending deltas against a specific revision (usually either of the parents) instead of the prior rev. This could come in handy for shallow clones and maybe slimmer manifests. It would also help the protocol part of ParentDeltaPlan.
An implementation could split the data files in read-only chunks, using hardlinks to share those chunks between filelogs; that could also save space on local clones and incremental backups. See http://www.selenic.com/pipermail/mercurial-devel/2008-February/004983.html for details.