Note:
This page is primarily intended for developers of Mercurial.
Dirstate Transaction Plan
Status: Completed
Main proponents: KatsunoriFujiwara
This page mainly focuses on how dirstate should be treated for consistency, especially around scope boundary of transaction and wlock at failure.
Contents
1. Summary
Tasks that we should do:
fix ambiguous timestamp issue
fix issue in ordinary cases (see #Ambiguous timestamp issue)
fix issue in hg commit -i case (see #Issue at `hg_commit_-i`)
make in-memory dirstate changes visible to external process (see #Conclusion about failure in transaction scope)
make dirstate support '.pending' mechanism
make in-memory changes visible to commit log editor process
make in-memory changes visible to all external hook process
wrap "critical region" with dirstateguard correctly (see #`merge.update()` users)
wrap "critical region" in commands.backout
wrap "critical region" in merge.graft
wrap "critical region" in rebase.rebasenode
detect aborting in critical region (see #Detection of aborting in critical region)
make localrepo.wwrite() find buggy invocation (see #Finding buggy `localrepo.wwrite()` invocation)
2. Failure in transaction scope
from:
http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/80003/focus=80271
http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/80342/focus=81376
2.1. `hg commit` case
Let's see typical case `hg commit` with pseudo-code. Lines ending with "#N" means "dirstate is changed here". Others may cause aborting. Scope of wlock/transaction is described in "with xxxx as yyyy:" style for convenience:: commands.commit(): if opts['addremove']: scmutil.addremove(): dirstate.update('files {added|removed|copied}') #1 # involving dirstate.write() at wlock.release() repo.commit(): with repo.wlock() as wlock: repo.status('to get current status') #2 editor('for commit message') wctx.sub(s).commit('recursively') for subs hook('precommit') with repo.transaction('commit') as tr: node = createcommit() hook('pretxncommit') dirstate.change('to mark as committed') #3 hook('pretxnclose') Changes at #3 above should be discarded at failure after it for consistency of subsequent `hg status` and so on, because: - `dirstate.parents` refers rolled back (= not-existing) revision - files are already `dirstate.normal`-ed But in some cases, changes at #1 and/or #2 should be kept.
2.2. `hg rebase` case
Let's see `hg rebase` case:: rebase(): with repo.wlock() as wlock: merge.update('to rebase destination'): dirstate.update(destination) #1 for each revs: merge.update('to merge from rebased one'): dirstate.change('all new file and content') #2 if conflict: raise InterventionRequired # expecting dirstate.write repo.commit(): (snip) with repo.transaction() as tr: node = createcommit() hook('pretxncommit') dirstate.change('to mark as committed') #3 hook('pretxnclose') To make `--continue` work as expected, dirstate changes should be kept, if transaction is aborted at (or before) pretxncommit hook above.
2.3. Categorize cases
Then, let me categorize cases. "touching file status" and "updating to new parents" below are corresponded to #2 and #3 in rebase case. (strictly speaking, #3 in rebase case implies also "touching file status", but its main purpose should be "updating to new parents") - both of "touching file status" and "updating to new parents" occur inside transaction - amend commit - import - qpush - transplant - unshelve In this case, all changes on dirstate should be discarded at failure of transaction, because discarding newly added revisions makes recent dirstate (especially, parents of it) meaningless. Aborting by InterventionRequired or TransplantError is treated as "intentional failure" (= a kind of success), because transaction is manually `close`-ed at that time. - "updating to new parents" occurs inside transaction, but "touching file status" occurs outside - backout - graft - histedit - rebase In this case, for subsequent `--continue` (or manually committing for backout), "touching file status" must be kept, even at failure of transaction, but "updating to new parents" must not. If writing dirstate changes out at other than `wlock.release` is reasonable enough, this can be simplified as: write dirstate changes out at opening and closing the transaction, but restore it from saved one at unexpected failure of transaction
2.4. Conclusion about failure in transaction scope
1. write in-memory changes at the beginning of the transaction, to restore it at failure of transaction easily for visibility to external hooks, just before "pretxnopen" seems reasonable. 2. use `addfilegenerator()` to make in-memory changes inside transaction visible to external hooks 3. in-memory changes inside transaction should be written out into `dirstate.pending`, even if `dirstate.write()` is explicitly invoked While transaction, `.hg/dirstate` can be kept as one at (1) above by (2) and (3). 4. `dirstate.pending` (= changes inside transaction) should be: - renamed to `dirstate` at success of transaction - removed at failure of transaction (this is equivalent to restoring dirstate to one at (1) above) This policy can also: - avoid that inconsistent dirstate is written out at failure e.g. aborted by pretxnclose hook at #3 in rebase case - make `dirstateguard` for qpush, amend and import useless restoring dirstate at failure should be done as a part of aborting transaction
3. Ambiguous timestamp issue
from:
http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/80342/focus=81475
http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/80342/focus=81549
Tasks other than #Finding buggy `localrepo.wwrite()` invocation were done by the series below:
3.1. Outline of issue
To detect change of a file without redundant comparison of file content, dirstate recognizes a file as certainly clean, if: 1. it is already known as "normal", 2. dirstate entry for it has valid (= not "-1") timestamp, and 3. mode, size and timestamp of it on the filesystem are as same as ones expected in dirstate This works as expected in many cases, but doesn't in the corner case that changing a file keeps mode, size and timestamp of it on the filesystem. The timetable below shows steps in typical issue case:: ---- ----------------------------------- ---------------- timestamp of "f" ---------------- dirstate file time action mem file system ---- ----------------------------------- ---- ----- ----- N -1 *** - change file "f" N - execute `hg foobar` - get wlock - instanciate dirstate -1 -1 - `dirstate.normal("f")` N -1 (e.g. via dirty check) - overwrite "f", but keep size N N+1 - release wlock - `dirstate.write()` N N - `hg status` shows "f" as "clean" N N N ---- ----------------------------------- ---- ----- ----- The most important point is that `dirstate.write()` is executed at N+1 or later. This causes writing dirstate timestamp N of "f" out successfully. If it is executed at N, `parsers.pack_dirstate()` replaces timestamp N with "-1" before actual writing dirstate out. Occasional test failure for unexpected file status is typical example of this corner case. Batch execution with small working directory is finished in no time, and rarely satisfies condition (2) above. If `dirstate.normallookup()` or so is certainly applied on "f" before releasing wlock, this issue doesn't occur. But there are some cases omitting `dirstate.normallookup` for changed files: - `hg revert --rev REV` (intentionally) - `hg import` using `patch.patch()` (intentionally) - `merge.update()` with explicit `partial` (intentionally) - failure of `merge.update()` before `merge.rrecordupdates()` (unintentionally)
3.2. Fixing issue
The root cause of this issue seems that nested wlock scopes prevents `wlock.release()` in `workingctx._checklookup()` from writing dirstate changes out:: # update dirstate for files that are actually clean if fixup: try: # updating the dirstate is optional # so we don't wait on the lock # wlock can invalidate the dirstate, so cache normal _after_ # taking the lock wlock = self._repo.wlock(False) normal = self._repo.dirstate.normal try: for f in fixup: normal(f) finally: wlock.release() except error.LockError: pass `workingctx._checklookup()` is invoked via `repo.status()` or so. Then, putting explicit `dirstate.write()` at the end of wlock scope above can fix this issue:: ---- ----------------------------------- ---------------- timestamp of "f" ---------------- dirstate file time action mem file system ---- ----------------------------------- ---- ----- ----- N -1 *** - change file "f" N - execute `hg foobar` - get wlock - instanciate dirstate -1 -1 - `dirstate.normal("f")` N -1 (e.g. via dirty check) ----------------------------------- ---- ----- ----- - `dirstate.write()` -1 -1 ----------------------------------- ---- ----- ----- - overwrite "f", but keep size N N+1 - release wlock - `dirstate.write()` -1 -1 - `hg status` shows "f" as "clean" -1 -1 N ---- ----------------------------------- ---- ----- ----- When `dirstate.write()` is executed at N+1, overwriting "f" causes timestamp N+1 of "f" on the filesystem, and timestamp N of "f" in dirstate doesn't cause this issue.
3.3. Additional topics
3.3.1. Issue at `hg commit -i`
(newly added at writing this page, as a memo)
`cmdutil.dorecord()` used via `hg commit -i` cuases similar issue, even if `dirstate.write()` is explicitly invoked in `workingctx._checklookup()`, because `cmdutil.dorecord()` does: 0. (it is assumed that timestamp of file on the filesystem is N) 1. backup file, which may be committed partially, with keeping timestamp 2. select diff hunks of current changes 3. revert file to be partially committed by `merge.update` 4. apply selected hunks on reverted file 5. commit changes (= mark committed file as "clean") 6. write dirstate changes into .hg/dirstate 6. restore file from backup at (1) 7. restore timestamp of file from backup at (1) Subsequent `hg status` doesn't recognize partially committed file as modified, if: - mode and size of the file isn't changed at (0) and (4) - (1), (2), (3) and (4) are executed at timestamp N, but - (6) is executed at N+1 or later
3.3.2. Finding buggy `localrepo.wwrite()` invocation
from:
Adding the following two lines to the top of `localrepo.wwrite()` seems helpful for finding bugs:: if self.dirstate._dirty: util.develwarn("writing file to disk while dirstate is dirty")
4. Failure in wlock scope
from:
4.1. `hg commit` case
Let's see typical `hg commit` case with pseudo-code. commands.commit(): if opts['addremove']: scmutil.addremove(): dirstate.update('files {added|removed|copied}') #1 # involving dirstate.write() at wlock.release() repo.commit(): with repo.wlock() as wlock: repo.status('to get current status') #2 editor('for commit message') wctx.sub(s).commit('recursively') for subs hook('precommit') with repo.transaction('commit') as tr: node = createcommit() hook('pretxncommit') dirstate.update(node) #3 hook('pretxnclose') "Changes at #3 above" was already discussed in "Failure in transaction scope". Then, how should we treat #1 and #2 ? 1. restore dirstate at failure of command This policy isn't suitable for `hg commit --addremove` case, because dirstate changes at #1 above should be kept for backward compatibility, even if `hg commit` itself fails. At least, the way to avoid restoring dirstate at failure (a kind of the inverse of `dirstateguard` ?) is required for this policy. For example, commands below expect that dirstate changes are kept even if command itself is terminated with non-0 exit code (detailed flows are shown in subsequent section): - aborting by exception (`hg graft`, `hg rebase` and so on) - terminating normally with non-0 exit code (conflicts at `hg backout` without --merge) 2. restore dirstate at failure in wlock scope Changes at #2 (just `normal`-ing) can be discarded at failure in wlock scope. This policy is suitable at least for `hg commit` case. But some other commands expect that dirstate changes are kept even if command itself is aborted by exception, as described above. So, the way to avoid restoring dirstate at failure is required for this policy, too. In addition to it, code path enclosed by `repo.wlock()` + `wlock.release` may be nested at runtime, and this may increase complexity. 3. keep dirstate changes (and write them out) even at failure This is also suitable for `hg commit` case, because this is as same as current behavior :-) Let's see other cases, too.
4.2. `merge.update()` users
There are some cases using `merge.update()` directly or indirectly (histedit and so on, too):: commands.backout(): with repo.wlock() as wlock: if linearmerging: # critical region >>>> merge.update('for merging linearly') #1 repo.setparents() #2 # critical region <<<< if conflicted: return 1 # expecting dirstate.write elif not commit: return 0 # expecting dirstate.write else: merge.update('to the revision to be backed out') #3 cmduit.revert('to backout') #4 repo.commit('the revision backing out') #5 if merging: merge.update('to the original parent') #6 merge.update('for merging with the revision backing out') #7 commands.graft(): with repo.wlock() as wlock: for rev in revs: merge.graft('to graft rev'): # critical region >>>> merge.update('for pseudo merging with graft source') #1 repo.setparents() #2 dirstate.copy('to duplicate copy information') #3 # critical region <<<< if conflict: throw abort() # expecting dirstate.write repo.commit('the revision grafted') #4 (BTW, `merge.update()` isn't yet correctly included into "critical region" of `hg graft`) rebase.rebase(): with repo.wlock() as wlock: merge.update('to destination') #1 for rev in revs: # critical region >>>> merge.update('for merging the revision to be rebased') #2 dirstate.copy('to duplicate copy information') #3 # critical region <<<< if conflict: raise InterventionRequired # expecting dirstate.write with dirstateguard('rebase'): repo.setparents('to drop 2nd parent of pseudo merge') #4 repo.commit('the revision rebased') #5 (BTW, "critical region" of `hg rebase` isn't yet correctly guarded at all) On the other hand, `merge.update()` itself has many points which may change dirstate and cause aborting at runtime:: merge.update(): with repo.wlock() as wlock: wctx.files() / wctx.dirty() #1 hook('preupdate') calculateupdates() repo.wwrite('all files to be updated') wctx.sub(s).get('update recursively') for subs dirstate.change('all updated files') #2 hook('postupdate') Aborting in "critical region" above causes the issue that inconsistent dirstate is written out. There are some solutions to fix this issue: 1. restore dirstate at failure of command 2. restore dirstate at failure in wlock scope 3. enclose "critical region" by `dirstateguard` to restore dirstate at failure (but keep other dirstate changes even at failure) According to `merge.update()` user cases, we can find something below out: a. `merge.update()` may be invoked multiple times in a wlock scope This also means that "critical region" may be also invoked multiple times in a wlock scope, and increases complexity of "a kind of the inverse of `dirstateguard` or so". b. additional processing may be needed after `merge.update()` to use result of it correctly Result of each commands isn't valid, if a command is aborted in "critical region". For example, lack of `repo.setparents()` at #2 of `hg graft` may cause committing invalid revision at subsequent `hg graft --continue`. So, dirstate changes in "critical region" should be discarded at failure. Then, enclosing "critical region" by `dirstateguard` seems easier than "a kind of the inverse of `dirstateguard` or so" to restore dirstate at failure.
4.3. Conclusion about failure in wlock scope
According to cases above, "restore dirstate at failure of command (or in wlock scope)" policy doesn't seem suitable for current Mercurial implementation. Then, what about the policy below ? 1. keep dirstate changes outside transaction (and write them out) even at failure 2. enclose "critical region" by `dirstateguard` to discard dirstate changes at failure
4.4. Additional topics
4.4.1. Detection of aborting in critical region
(newly added at writing this page, as a memo)
`merge.update()` keeps `.hg/updatestate` while "critical region" below:: merge.update(): with repo.wlock() as wlock: wctx.files() / wctx.dirty() #1 hook('preupdate') calculateupdates() # critical region >>>> repo.wwrite('all files to be updated') wctx.sub(s).get('update recursively') for subs dirstate.change('all updated files') #2 # critical region <<<< hook('postupdate') This prevents `hg graft --continue` and so on from resuming on inconsistent status, when previous `merge.update()` was aborted in critical region of it. But in fact, actual "critical region" of graft is wider than one of `merege.update()`:: commands.graft(): with repo.wlock() as wlock: for rev in revs: merge.graft('to graft rev'): # critical region >>>> merge.update('for pseudo merging with graft source') #1 repo.setparents() #2 dirstate.copy('to duplicate copy information') #3 # critical region <<<< if conflict: throw abort() # expecting dirstate.write repo.commit('the revision grafted') #4 For example, if previous `merge.update()` is aborted at `hook('postupdate')`, `hg graft --continue` can resume grafting, even though #2 and #3 in critical region of `hg graft` aren't executed in this case. How about them to prevent such situation ? - introduce status information to indicate whether critical region is completed successfully or not (`graftmergingstate` or so) - invoke `cmdutil.checkunfinished()` also at `hg graft --continue` - extend `cmdutil.checkunfinished()` to omit own in-progress state for example, `cmdutil.checkunfinished(expected='graftstate')` at `hg graft --continue` is aborted by not `graftstate` but `graftmergingstate` (in addition to it, absence of `graftstate` may have to cause aborting) backout (only `backoutmergingstate` or so), rebase, histedit and tranplant should need these, too.
5. Detail about HG_PENDING mechanism
from:
See also TransactionPlan
This series is poking in the right direction, but it is failing to acknowledge the following list of principle and will not result in a working behavior. 1) '.pending' files are made necessary by the existence of a transaction, 2) HG_PENDING (and .pending files usage) must -only- occured in the context of a transaction. 3) The transaction object is the one handle the '.pending' files and associated non-pending file. This includes: 3.1) The transaction must purge any existing '.pending' when it open/close 3.2) generation of .pending files is done through the transaction 3.3) writing of the final files is also done through the transaction (*) We now have a developer warning feature, we should use it more. Let me details this more.
5.1. Transaction make '.pending' necessary.
The goal of transaction is to prevent any external viewer to see its content before it is committed. So anything happening during the transaction is actually written to disk only at `tr.close()` time. To lets hooks and tools see the content of the transaction, the 'pending' mechanism is used. An environment variable let sub-invocation of hg know that it needs to looks for extra data in different file.
5.2. pending happen during transaction only
In the general case, we do not use the same 'pending' mechanism for dirstate related changes for the following reason: - dirstate is related to the working copy. Most external viewer like clone/pull/push do not care about the working copy so we are not at risk of uncommited data escaping the repository. - dirstate is related to the content of working copy (parents, but also file content). We cannot have an atomic update of all the file in the working copy so we'll have transient invalid state during the update anyway, therefor we are not even trying to ensure we have an atomic transaction for external viewer. However, there is a case where we need dirstate to use a '.pending' mechanism: When the dirstate change makes it refer to content contained in a not-yet committed transaction. - we do not want to make such dirstate visible until the transaction content is visible. - sub-invocation of mercurial in hook/tool need to be able to see this content anyway. To simplify things. I'm assuming:: "dirstate refer to content in the transaction." is the same as:: "dirstate change happen during a transaction." So, if your dirstate change happens inside a transaction, it should be collaborating with it. Your series have to make use of the`repo.currenttransaction()` method at least once somewhere.
5.3. The transaction object is the one handling file
To simplify entry points and ensure consistency, the transaction is handling all files involved in the transaction from end to end. So when dirstate collaborate with the transaction, it has to let the transaction handle the dirstate file related to point 3.2: We are generating `pending` file on demand, because if no subprocess is to be call, we do not need to generated them. Every subcall that can need pending file will use call `tr.pending()` to trigger this generation. This is a single entry point that the code know and use. The function will also handle the case where there is no pending changes and not special logic is needed. To get the transaction able to include your pending file, you have to register a "file generator" with `tr.addfilegenerator` the documentation is fairly extensive so I won't repeat it here. (if document is unclear we have to patch it). I expect a call like:: tr.addfilegenerator('dirstate', ('dirstate', self._write, location='plain') Where self._write is a method accepting an (already open) file object and writing dirstate content in it. addfilegenerator can be called multiple time, each call replacing the other. related to point 3.3: You have to let the transaction handle the final write too. The reason you have dirstate being handled by the transaction logic (if present) is that you never want to write the dirstate to disk (except pending) until the transaction data are visible too. And the dirstate/guard object have no way to know when this will happen. The transaction object is the one who know about that. Lets looks at the following case:: with tr: commit = newcommit() with ds: ds.setparent() ds.close() # confirm the dirstate change othercrap() tr.close() # commit the transaction In this case the "ds.close()" is not the right time to write the file down, it is too early. It has to be written down at 'tr.close()' time. By chance, the transaction will automatically take care of that since you used `addfilegenerator`. summary: your .close() must not touch disk in the in-transaction case. Will handle it. The final writing of the dirstate file will not be a simple move of the '.pending' file. This is necessary because the pending are lazily generated. Lets looks at the following scenario:: with tr: commit = newcommit() with ds: ds.setparent() ds.close() # confirm the move tr.pending() with ds: ds.setparent() ds.close() # confirm the move tr.close() # commit the transaction In this case, the .pending file is outdated compared to the in-memory state. So we need to regenerate the file anyway. summary: transaction won't use simple rename. related to point 3.1: You also have to be very careful of not leaving any '.pending' file around. Lets look at the following scenario:: $ hg commit # write dirstate.pending, leave it behind $ hg up 42 # does not need to write a pending file $ hg pull # transaction call hook on pending change, dirstate is read # from .pending becaues it exists So we have to make sure we do not leave the file behind. By chance, the transaction will take care of that for you. It keep track of all temporary file it generated and make sure they are cleaned up when the transaction is closed or rollbacked. (We should probably aggressively nuke all known '.pending; type file at transaction open time but I'm not sure it is done yet.) other nice thing: the transaction logic will also backup your file (and restore in rollback case) your dirstate file as soon as your register a generator for it.
5.4. Developer Warning:
To work your code (will) make multiple assumption: - wlock is alway taken when we touch the dirstate, - write are always done in a dirstate guard context, - we do not do any direct writing if a transaction exists, It is easy to check this is true, but we crashing in these case is suboptimal since it will affect human user of unsafe, but already existing code. By chance we now have a "developer warning concept" (that you get from `devel.all=true` config) and the default behavior for tests is now to run them. So adding code to detect and issue warning in this case will let the code base be cleaned up over time.