Note:

This page is primarily intended for developers of Mercurial.

Caching

This page describes the various caches that Mercurial uses. Caches can be divided into two: persistent and in-memory.

1. On-disk caches

We have a few on-disk caches, which are stored in .hg/cache. These include the branch cache and the tag cache.

Notably, these caches are designed to be 'pure', in the sense that Mercurial can operate without them. If Mercurial is unable to read or write them for some reason, it will continue to operate but more slowly.

Also, these caches are intentionally designed to be unversioned. If for some reason we need to change the format, we will pick a new name for the cache and drop all support for reading the older format. This will allow different versions of a client to access a repo with no maintenance burden.

2. In-memory caching

Mercurial has two classes (decorators) for caching expensive computations: propertycache and filecache.

The process of forgetting a cached value is called invalidation. All in-memory caches are invalidated when locks (see LockingDesign) are acquired to make sure the value we use for a property is the most recent one (using the old value could lead to loss of data).

2.1. propertycache

This is the simpler of two. When applied to a function, it turns it into a property. On the first call it executes the function and saves the result in the containing object's __dict__. Subsequent calls to the property will return the saved result rather than running the function.

Usage:

   1 from util import propertycache
   2 
   3 class foo(object):
   4     @propertycache
   5     def expensive(self):
   6         print 'created'
   7         return [1, 2, 3]
   8 
   9 f = foo()
  10 f.expensive # prints created
  11 f.expensive # simply returns [1, 2, 3]
  12 
  13 # to forget the saved value and calculate it again, use delattr
  14 delattr(f, 'expensive')

See http://selenic.com/repo/hg/file/8ceabb34f1cb/mercurial/util.py#l241 for the implementation.

2.2. filecache

filecache is similar to propertycache, but is meant to be used on properties whose data is based on files.

2.2.1. Motivation

Some properties are more expensive to construct than others and involves reading and parsing of a file. Using propertycache on those will read the file every time the property is invalidated, even if it hasn't changed since the last time it was read.

We could instead remember how the file looked like (by saving its stat info) when the property is created. Then when we want to create it again, we compare the new stat info with the old one and only create it when they're different.

<!> We don't miss out on changes by merely looking at the stat info because Mercurial has two modes for changing internal files: 1) appending to a file, thereby changing its size, and 2) atomically replacing it, causing the inode to change.

{i} filecache was created during the development of the CommandServer.

2.2.2. Implementation

The decorator accepts a single parameter: a path to the underlying file to watch for changes. When first called it remembers the stat info of the file along with the value of the property in _filecache, a dict on the object that contains the property.

Calling the property afterwards returns the saved value from __dict__, like propertycache. When invalidated, the saved value is removed from __dict__ and a subsequent call to the property will result in a stat of the underlying file to check for changes. If nothing changed, we simply return the saved result. Otherwise, we recreate it and update the stat info (called refresh in the code).

Changes to a property are made when a lock is held (to ensure a single writer, see LockingDesign). Therefore when the lock is released we know we have the most recent value and we need to refresh its stat info. This is done in the unlock functions of lock and wlock.

Sometimes, the contents of a file changes without updating the property that looks at that file (this happens during rollback and strip when files are truncated or renamed). Since the view we have in-memory of that file is stale, we need to invalidate the property so the next time it is accessed it'll be reconstructed (there used to be a bug that caused invalidated properties to refresh their stat info even if they weren't reconstructed, fixed by XXX).


CategoryDeveloper

Caching (last edited 2012-12-24 00:08:51 by rcl)