Differences between revisions 1 and 4 (spanning 3 versions)
Revision 1 as of 2016-03-19 18:47:22
Size: 6347
Comment:
Revision 4 as of 2016-05-10 23:32:03
Size: 8797
Comment: revise "outline of issue" and "steps to make cache validation exact"
Deletions are marked like this. Additions are marked like this.
Line 35: Line 35:
Base idea of the solution for this issue was suggested by Matt in
[[http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/85394/focus=85437]]

But simple "advance mtime only if size, ctime and mtime are same"
might cause "fluttering for mtime" like below:

  1. create FILE with (SIZE, CTIME, MTIME)
  2. modify FILE with (SIZE, CTIME, MTIME)
  3. change mtime of FILE into MTIME+1, and
  4. stat of FILE is now (SIZE, CTIME, MTIME+1)
  5. modify FILE with (SIZE, CTIME, MTIME) again, but
  6. (SIZE, CTIME, MTIME) stat of FILE is kept, because it is different from stat at (4)
  7. (2) - (6) might be repeated many times while same CTIME

In addition to it, "fluttering for size" like below might occur.

  1. crete FILE with (SIZE1, CTIME, MTIME)
  2. modify FILE with (SIZE2, CTIME, MTIME)
  3. modify FILE again with (SIZE1, CTIME, MTIME)
  4. (2) - (3) might be repeated many times while same CTIME

Root cause of this issue is that timestamp in second doesn't have
enough resolution to detect multiple changes "at same second".

Therefore, we should advance mtime of changed file, if changes occur
"at same second". If timestamps below are same, changes should occur
"at same second".

  * ctime of original file
  * ctime of changed file
  * mtime of changed file
Line 40: Line 72:
  * bookmarks
Line 42: Line 75:
  * obsstore
  * bookmarks
Line 88: Line 119:

Base idea of this solution was suggested by Matt in [[http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/85394/focus=85437]]
Line 110: Line 139:
   (self.stat.st_size == oldstat.stat.st_size and
    
self.stat.st_ctime == oldstat.stat.st_ctime and
    # "self.st_mtime > oldstat.st_mtime" is included to avoid
    # fluttering between N and N + 1 second
   
self.stat.st_mtime <= oldstat.stat.st_mtime)
   (self.stat.st_ctime == oldstat.stat.st_ctime and
    self.stat.st_ctime == self.stat.st_mtime)
Line 117: Line 143:
This is used to examine whether stat of changed file has ambiguity
against one of original file.

"fluttering" above means the case like below:

  1. create FILE with (SIZE, MTIME)
  2. modify FILE with (SIZE, MTIME)
  3. change mtime of FILE into MTIME+1, and
  4. stat of FILE is now (SIZE, MTIME+1)
  5. modify FILE with (SIZE, MTIME) again, but
  6. (SIZE, MTIME) stat of FILE is kept , because it is different from stat at (4)
  7. (2) - (6) might be repeated many times while MTIME

So, {{{if NEW_MTIME < OLD_MTIME}}}, it should be treated as "ambiguous", too.
This examines whether changes occur "at same second". If so, stat of
changed file has "ambiguity" against one of original file, and we
should advance mtime of changed file.
Line 150: Line 165:
=== make classes, of which instance is cached from file, aware ambiguity === === change file generation of transaction for ambiguity ===

Add "checkambig" optional argument to {{{transaction.addfilegenerator()}}},
and it will be used at opening file to write data out.

This is needed, because below files are written out also via file
generation of transaction.

  * .hg/bookmarks
  * .hg/dirstate
  * .hg/phaseroots

Fortunately or unfortunately, {{{transaction.addfilegenerator()}}} is
currently used only by classes related to files above.

For simplicity, should we make transaction generate files with
{{{vfs.open(FILENAME, "w", checkambig=True)}}} always ?

=== make classes, of which instance is cached from file, aware of ambiguity ===
Line 153: Line 186:
be ambiguous at change, use {{{vfs.__call__()}}} with
{{{checkambig=True}}}

  * dirstate.dirstate
  * bookmarks.bmstore
be ambiguous at change, should be aware of ambiguity.

  * bookmarks.bmstore (also for .hg/bookmarks.current)
  * dirstate.dirstate (also for .hg/branch)
Line 159: Line 191:
  * obsolete.obsstore
We should make them use:

  * {{{vfs.__call__()}}} with {{{checkambig=True}}}
  * {{{transaction.addfilegenerator()}}} with {{{checkambig=True}}}

=== make restoring from backup file aware of ambiguity ===

Renaming from backup of cached file overwrites original file in cases
below, and this renaming might cause ambiguity, too.

  * restoring from dirstate backup file at failure in scopes below
    * transaction scope
    * dirstateguard scope
  * restoring from "undo." files at transaction rollback
    * bookmarks
    * dirstate
    * phaseroots

Therefore, we should:

  * add (optional) ambiguity check logic to {{{vfs.rename()}}}
  * make restoring from backup file aware of ambiguity
    * restoring dirstate at failure (transaction or dirstateguard scope)
    * restoring files at {{{localrepository._rollback()}}}

In addition to it, {{{transaction}}} restores contents of files, which
are registered via {{{addfilegenerator()}}}, from backup
({{{journal.backup.*}}} at failure, {{{undo.backup.*}}} at rollback)
by {{{util.copyfile()}}}.

Therefore, we should also:

  * add (optional) ambiguity check logic to {{{util.copyfile()}}}
  * make {{{transaction._playback()}}} aware of ambiguity

Note:

This page is primarily intended for developers of Mercurial.

Exact Cache Validation Plan

Status: need discussion

Main proponents: KatsunoriFujiwara

This page mainly focuses on the way to validate cache exactly even if stat information of file isn't changed as expected at changing file itself.

1. Outline of issue

This comes from issue4368 https://bz.mercurial-scm.org/show_bug.cgi?id=4368#c10 , and was once RFC-ed in http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/85394

On POSIX, comparing i-node number for file cache validation causes unexpected overlooking changes of file, because i-node number is reused rapidly in many cases. For example, it is assumed that steps below occur in same CTIME/MTIME:

  1. create FILE with (SIZE, INO1)
  2. modify FILE with (SIZE, INO2)
  3. modify FILE with (SIZE, reused INO1)

Then, file base property cached with stat at (1) overlooks changes at (3).

There are some files, of which size might be kept even at changing (for example, dirstate, bookmarks and so on). This causes inconsistent result.

Base idea of the solution for this issue was suggested by Matt in http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/85394/focus=85437

But simple "advance mtime only if size, ctime and mtime are same" might cause "fluttering for mtime" like below:

  1. create FILE with (SIZE, CTIME, MTIME)
  2. modify FILE with (SIZE, CTIME, MTIME)
  3. change mtime of FILE into MTIME+1, and
  4. stat of FILE is now (SIZE, CTIME, MTIME+1)
  5. modify FILE with (SIZE, CTIME, MTIME) again, but
  6. (SIZE, CTIME, MTIME) stat of FILE is kept, because it is different from stat at (4)
  7. (2) - (6) might be repeated many times while same CTIME

In addition to it, "fluttering for size" like below might occur.

  1. crete FILE with (SIZE1, CTIME, MTIME)
  2. modify FILE with (SIZE2, CTIME, MTIME)
  3. modify FILE again with (SIZE1, CTIME, MTIME)
  4. (2) - (3) might be repeated many times while same CTIME

Root cause of this issue is that timestamp in second doesn't have enough resolution to detect multiple changes "at same second".

Therefore, we should advance mtime of changed file, if changes occur "at same second". If timestamps below are same, changes should occur "at same second".

  • ctime of original file
  • ctime of changed file
  • mtime of changed file

2. Caching for web UI

hg.cachedlocalrepo uses st_mtime and st_size of stat of files below to validate repo object on ALL platforms.

  • bookmarks
  • changelog
  • phaseroots

There is no trigger to invoke repo.invalidate() explicitly at just referring, for hgweb. Therefore, changes might be shadowed by ambiguity of files above (other than changelog), if repo object is reused.

3. Performance problem

If we simply make changing file "aware of ambiguity" always for atomictemp=True, it might cause performance problem.

For example, revlog code opens file in write mode with atomictemp=True. This means that all files derived from revlog is "aware of ambiguity".

But such files aren't ambiguous, because of "append only" revlog policy. In addition to it, a repository might write many filelog files at once (= require cost to get stat of them), but it doesn't cache them (= check of ambiguity isn't needed).

In conclusion, we should make util.atomictempfile aware of ambiguity, only if such awareness is required.

4. Cachability of files on Windows

On Windows, file isn't cachable since 2.0 (or 2aa3e07b2f07).

But on the other hand, hg.cachedlocalrepo uses st_mtime and st_size of stat to validate repo object on ALL platforms, as described above.

According to 2aa3e07b2f07:

"the path is uncacheable" above seems to mean "there is no reliable file ID on Windows".

Therefore, "using st_mtime, st_size and so on for cache validation" itself seems not problematic also on Windows.

5. Steps to make cache validation exact

5.1. introduce "new cachestat" class

Introduce portable "new cachestat" class.

It mainly provides methods below.

__eq__(self, oldstat) returns:

   (self.stat.st_size == oldstat.stat.st_size and
    self.stat.st_ctime == oldstat.stat.st_ctime and
    self.stat.st_mtime == oldstat.stat.st_mtime)

This is used to examine whether file is changed or not.

On the other hand, isambig(self, oldstat) returns:

   (self.stat.st_ctime == oldstat.stat.st_ctime and
    self.stat.st_ctime == self.stat.st_mtime)

This examines whether changes occur "at same second". If so, stat of changed file has "ambiguity" against one of original file, and we should advance mtime of changed file.

5.2. make util.atomictempfile aware of ambiguity

Make close() of util.atomictempfile examine whether stat of changed file is ambiguous or not in steps below.

  1. invoke original close() of file object for temporary file

  2. get "new cachestat" of original file ("oldstat")
  3. get "new cachestat" of changed file ("newstat")
  4. change mtime of changed file into "mtime of oldstat + 1", if newstat.isambig(oldstat)

For performance reason described above, close() of util.atomictempfile examines ambiguity, only if "checkambig" optional argument is True at contruction time.

5.3. change vfs.__call__() for ambiguity

Add "checkambig" optional argument to vfs.__call__(), and it will be passed to util.atomictempfile.

5.4. change file generation of transaction for ambiguity

Add "checkambig" optional argument to transaction.addfilegenerator(), and it will be used at opening file to write data out.

This is needed, because below files are written out also via file generation of transaction.

  • .hg/bookmarks
  • .hg/dirstate
  • .hg/phaseroots

Fortunately or unfortunately, transaction.addfilegenerator() is currently used only by classes related to files above.

For simplicity, should we make transaction generate files with vfs.open(FILENAME, "w", checkambig=True) always ?

5.5. make classes, of which instance is cached from file, aware of ambiguity

Classes below, which is cached via scmutil.filecache and might be ambiguous at change, should be aware of ambiguity.

  • bookmarks.bmstore (also for .hg/bookmarks.current)
  • dirstate.dirstate (also for .hg/branch)
  • phases.phasecache

We should make them use:

  • vfs.__call__() with checkambig=True

  • transaction.addfilegenerator() with checkambig=True

5.6. make restoring from backup file aware of ambiguity

Renaming from backup of cached file overwrites original file in cases below, and this renaming might cause ambiguity, too.

  • restoring from dirstate backup file at failure in scopes below
    • transaction scope
    • dirstateguard scope
  • restoring from "undo." files at transaction rollback
    • bookmarks
    • dirstate
    • phaseroots

Therefore, we should:

  • add (optional) ambiguity check logic to vfs.rename()

  • make restoring from backup file aware of ambiguity
    • restoring dirstate at failure (transaction or dirstateguard scope)
    • restoring files at localrepository._rollback()

In addition to it, transaction restores contents of files, which are registered via addfilegenerator(), from backup (journal.backup.* at failure, undo.backup.* at rollback) by util.copyfile().

Therefore, we should also:

  • add (optional) ambiguity check logic to util.copyfile()

  • make transaction._playback() aware of ambiguity

6. Optional tasks

6.1. make scmutil.filecache use "new cachestat"

To invalidate changed cache certainly, make scmutil.filecache use "new cachestat".

This also improves performance on Windows, because of reusing cached information :-)

6.2. make hg.cachedlocalrepo use "new cachestat"

BTW, after "make cache validation exact", files are cached on all platforms, and cache validity is examined exactly (I hope so :-)).

Then, can repo.invalidate() replace recreation of out-of-date repo object ?

6.3. make util.atomictempfile ready for context manager

Now, file opened with atomictemp=True can't be used for with Python statement.


CategoryDeveloper CategoryNewFeatures

ExactCacheValidationPlan (last edited 2017-06-09 02:07:09 by KatsunoriFujiwara)