Note:
This page is primarily intended for developers of Mercurial.
Error reporting cleanup
Status: Early draft
Main proponents: RodrigoDamazio
This is a speculative project and does not represent any firm decisions on future behavior.
This proposes cleaning up how Mercurial failures are reported, both to the user and for any extensions that gather/upload failure information.
1. Goal
Currently, Mercurial has a small set of exceptions that are used to report errors. Notably, a large number of failure cases indiscriminately use error.Abort, without regard to the nature of the failure. While, as long as descriptive error messages are used, the end user may not care about the distinction, any deployments where this data is aggregated (such as real-time monitoring/alerting)
The goal is to separate the errors into categories, as well as their likely culprit.
2. Detailed description
2.1. Categorization
I propose these categories of errors:
Category |
Description |
Example |
Return code |
HTTP equivalents |
Input |
The failure was caused by invalid user input |
Tried to check out non-existent revision |
10 |
400, 404 |
State |
The failure was caused by the repository or environment being in an invalid state for the requested operation |
There are unresolved conflicts |
20 |
402, 409, 412, 418, 422, 423, 428, 451 |
Configuration |
The failure was caused by configuration |
Failure to parse config file, unmet requires |
30 |
N/A (always local) |
Storage |
when the failure is caused by interacting with storage |
IOError, file corruption |
50 |
N/A (always local) |
Remote |
When the remote (e.g. server) causes a failure |
Disconnected, invalid payload received, required capability missing |
100 |
other 5xx, other 4xx |
Security |
when the failure is caused by some aspect of security |
Bad server credentials, expired local credentials for network filesystem, mismatched GPG signature, DoS protection |
150 |
401, 403, 407, 420, 425, 429, 450, 463, 495-498, 511, 525, 526 |
Intervention required |
Not a final failure, but rather indicative that user intervention is required before hg can continue |
merge conflict resolution required |
240 |
N/A (always local) |
Cancelled |
When the user cancels the operation |
KeyboardInterrupt, aborting the editor, etc. |
250 |
499? |
Internal |
unexpected crashes or unexpected internal state |
Any unexpected Python exception |
254 |
500, 520 |
2.2. Remote error categorization
Remote errors can be caused by most of the above issues. The Remote category will only be applied when the underlying cause is not known, or the remote is clearly at fault.
For the HTTP transport, HTTP errors will be mapped to the above categories per the table above. For the SSH transport, TBD
2.3. Status codes
There'll be a static mapping from the above basic categories to error codes, per the table above.
Specializations of the categories in extensions (usually as exception subclasses), when required, may override the error code. The spacing of the return codes in the table is meant to allow grouping of related codes.
The status codes 1 and 255 are explicitly avoided so reporting or monitoring systems can tell between an older version of Mercurial (which will use one of those) and these new codes.
2.4. High-level code changes
Most high-level code, such as the implementation of commands, will be updated to use the appropriate exception classes according to the categorization above.
2.5. Low-level library changes (e.g. revlog)
In many cases, the code generating the error is not at the high level where the user's intentions are better known, but rather inside low-level libraries.
For instance, one can use scmutil.revsingle() to fetch a revision, and that will raise an exception if the revision is not found, without knowledge of whether that revision was requested by the user or calculated/expected by some internal logic. These instances will be updated to use internal-only exceptions, and calling sites will be required to expect that and translate it to the appropriate type, with more context. This also gives the opportunity for the higher-level code to implement any additional logic that may make sense to determine the cause of the failure, and hopefully provide more helpful messages to the users.
If any such internal exception ever reaches the user, a develwarn will also be printed, and it will otherwise be treated as the Internal type from the table.
This is likely to be a large and repetitive code change, but one that promotes good code health in error handling.
2.6. Cleanup
error.Abort will be mercilessly deleted once all its uses are gone. Its memory will not be honored.
2.7. Future improvements
- Identifying whether certain errors are caused by the deployment (e.g. configuration in /etc) vs by the user's customizations
- Identifying whether certain errors are caused by extensions vs not (e.g. was this inside extension code, other than orig() calls?)
- Extending the wire protocols so that remotes can specify the error type to the client, and the plain "remote error" would only be reported when that's not received