Note:
This page is primarily intended for developers of Mercurial.
cHg Porting Plan
Steps to merge cHg into the Mercurial tree, plus possible future improvements.
Currently cHg server is in core. While the client lives inside contrib/chg.
Contents
1. Quick compatibility notes for extension developers
Here are important things developers should care about, without digging into chg internals:
- uisetup and extsetup are executed only once per server process, currently. This means you cannot expect the ui object to have the up-to-date config and should not rely on ui.config in these functions. In the future, chg will run uisetup and extsetup per request.
- sys.argv is not what you want. We will likely end up with something like “ui.args” etc. as an alternative in the future. Meanwhile you need to wrap dispatch.runcommand.
- Be prepared for an unnecessary reposetup call. cHg runs "hg serve". The "serve" command used by chgserver is optionalrepo and will call reposetup unnecessarily.
Large (not a single file) extension developer: chg may not be able to detect changes for your source files. chgserver only checks a single file per extension so only "init.py" is covered for large extensions.
2. Internals
Currently, chg is implemented as an extended "command server" that tries to minimize the behavior difference from the vanilla hg command.
Some of them are subject to change. See "Future Improvements" for ideas.
2.1. Terminology
- chg: the client, implemented in C
- chgserver: the server, from ps output it's “hg serve --cmdserver chgunix”
- sensitive environment variables: Environment variables that once changed, only a restart can guarantee correctness. Currently they're anything starting with HG, LD_, LC_, PYTHON; and PATH, TZ, TERM(INFO), LANG(UAGE), CHGHG
- sensitive config options: Config options that once changed, only a restart can guarantee correctness. Currently they're [extensions], [alias], [extdiff].
- confighash: The hash of sensitive config options + sensitive environment variables
mtimehash: The hash of mtime, file size of extension source files + mercurial/version.py + the python binary
2.2. chgservers' unix sockets
There are multiple chgserver processes. It's one per confighash (has nothing to do with repo).
If you don't customize extensions in reporc, you can have one chgserver for all repos of one user. Or, if you changes the sensitive config, like running a command with “--config extensions.blackbox=!”, you are changing the confighash and may result in a new chgserver.
Each chgserver has one unix socket that can be connected by the client. The sockets can be found in $TMP/chg${UID}/:
% ls -l /tmp/chg${UID} -rw------- lock (lock used by chg when spawning a new server) lrwxrwxrwx server -> server-c01b98ab25cf (symlink, to the latest server) srwxr-xr-x server-3d0bbfab25cf (3d0bbfab25cf is confighash) srwxr-xr-x server-c01b98ab25cf (c01b98ab25cf is confighash)
If a socket file is removed, the corresponding chgserver will exit.
If a chgserver has been idle for (chgserver.idletimeout) seconds, it will exit and unlink its socket file.
2.3. What happens when running a chg command
- chg connects to "/tmp/chg${UID}/server", spawns a new server on demand
- chgserver forks per request [i]
- chg sends and replaces chgserver's fd 0, 1, 2, umask, cwd, env vars
- chg sends argv, asking the server if it can serve
- chgserver calculates new confighash and mtimehash
- if mtimehash changes, tell the client to unlink the server's socket path and spawn a new server
- if confighash changes, redirect the client to the new confighash path
- if nothing changes, tell the client it's okay to send the actual command
- chg does everything the server tells it to do
- chg sends the actual command (argv) and asking the server for the pager command
- if a pager is needed, chg starts the pager, like what the pager extension does, but implemented in C
- chgserver runs the actual command
- chg waits for the exit code and exits accordingly
2.4. What happens when spawning a new server
- chg unlinks "/tmp/chg${UID}/server" and takes a server creation lock [ii]
- chg starts $CHGHG, or $HG, or hg.real with the arguments like "serve —cmdserver cmdunix ..."
- chgserver loads all extensions, configs and calculates confighash
- chgserver creates "/tmp/chg$UID/server-${confighash}"
- chgserver symlink "/tmp/chg$UID/server" to "server-${confighash}"
- chg connects to "/tmp/chg$UID/server" and releases the lock
2.5. Some smaller details
For interactive ui.system,
- chgserver sends the command to chg
- chg executes the command
- chg reports the exit code to chgserver
For Ctrl+C, Ctrl+Z and signals in general,
- If the signal is sent to the interactive command started by ui.system (for example, the commit message editor), it just works.
- If the signal is sent to chg (for eample, the ncurses interface by “chg commit -i”)
- (the forked) chgserver sends its pid to chg when connected
- chg forwards the signal to that pid (there is a bit extra stuff for SIGTSTP to make sure ncurses work)
3. Known Issues
Interactive ssh: see https://bz.mercurial-scm.org/5272
serve --daemon does not work
--time is noop
- Infinite loop with some bundle or shared repo setup
see also CommandServer
4. Future Improvements
- Run uisetup per request. See www.mercurial-scm.org/pipermail/mercurial-devel/2016-July/085965.html for details.
- Better, less hacky pager integration.
- Lock-less server creation.
- Support long unix socket path. (To deal with very long TMPDIR)
- Use XDG_RUNTIME_DIR
5. The old discussion
The following content was kept for archaeological purpose. They may not reflect the current design. Some issues mentioned below are already solved.
5.1. Handling config change
To behavior exactly like original hg, the server needs to do something when "config" (config files, environment variables etc.) changed.
This can be tricky with config items like extensions.*. Since it's impossible to undo the side effects caused by pre-loading an extension.
5.1.1. Places affecting config
Config files (Note: $HGRCPATH can override where we look up configs)
Command line arguments like --config, --repo, --cwd (--repo and --cwd can affect where we read .hg/hgrc)
Current directory (like --cwd)
Environement variables: LD*, TZ, HG*, LANG, LC_*
Python souce files, like __version__, or even extension files (used by developer)
Be careful with config files, if they are read in different processes or at different time, there can be race conditions.
5.1.2. What to do on config change
Restart? Reload? Multiple processes?
On config change, it isn't always necessary to restart the server. For example, ui.style can be reloaded by recreating ui object. On the other hand, extensions can't be unloaded. Anyway, it won't be a problem to restart the server aggressively assuming that the config files won't change too often.
Environment variables are also a problem. The two things above change in a "global event" basis. At some point of time, the config/version changes and all running servers need to be restarted. However, Environement variable can change more often and can be different in two different shell, using Mercurial at the same time. We probably need multiple server based on environment here.
Consider the fact that a user can disable or change extensions in $REPO/.hg/hgrc, multiple processes is the way to go. Otherwise if the user is working on multiple repos with different configs at the same time, the server can restart frequently.
The basic idea is to have different processes listening on different socket files. However with filesystem race conditions considered, reload or restart makes sense. Otherwise we need to verify two processes see same files.
Put it all together, multiple processes + restart / reload are probably all necessary.
Multiple processes: To handle the situation a user is working on multiple repos with different [extensions] in $REPO/.hg/hgrc at the same time, or to handle running hg commands with $HGRCPATH differences at the same time.
Restart: To handle files (especially *.py ) change, or anything cannot be done with reload, such as [extensions] change. Also help to reduce the total number of server processes.
- Reload: (Optionally) Faster restart. Performance win for easy cases.
In order to avoid spawning too many servers (e.g. server per $REPO/.hg/hgrc would be heavy), we could narrow config sections that are included in the hash. For example, select the socket path by hash(.. + config.extensions) and reload the other sections by server.
See "Server model", "How to detect config change?", "How to handle Environment Variables difference", "Who restarts the server?" below for possible implementation details.
5.1.3. Where to detect the change
Client or Server? Either one, we may want to do it completely in one side. In this way the logic is easier to follow and more maintainable.
Client has difficulity to detect hg installation location. This makes __version__ and extension source code change detection extremely hard.
Do it on server after receiving all the information (environ, command line) just before executing the actual command.
5.1.4. Server model
The current plan (subject to change, current being worked on by Jun) tries to cover all edge cases is like this:
- One master process per user
- Master process starts workers on demand
- Workers listen on different socket files
- The socket paths used by worker process are decided by everything in "Places affecting config" except for files to avoid dealing with filesystem race (master does not read repo config)
- Worker processes are responsible for restarting themselves (or reloading if full restart is not necessary) when config or source files change. See "Transparent restart" below.
- Master process also needs to restart itself when source files change
client master workers : |listen() : # master socket (/tmp/chg${UID}/master) |-connect()>|accept() : |-send()--->|calc_hash(): # send env, argv and calculate non-files hash : |-fork()--->|exec() # spawn if no worker for that non-files hash (by checking filesystem and pid) : : |listen() # worker socket (/tmp/chg${UID}/worker-${HASH}) |<---send()-| : # tell worker socket name |-close()-->| : # client no longer needs master |-connect()------------>|accept() # (*) | : |fork() # fork per connection |-send()--------------->|... # send env, argv and run the actual command ...
(*) We may also want to do connect() and some fd magic at this point from master, this will make the complex scene transparent to the client. The client only needs to connect once.
For the first step, workers are forked per connection. See "Forking model" below for possible future improvements.
5.1.5. Server model with no master/worker, no restart
Proposed by Yuya based on Jun's idea.
- No master process
- Client starts servers
All servers listen at /tmp/chg${UID}/server-${HASH}, and the first server creates a symlink /tmp/chg${UID}/server -> server-${HASH}
Client first connects to /tmp/chg${UID}/server
- Client asks server if it can handle request, if it can't, the server
calculates another socket path, say /tmp/chg${UID}/server-${HASH}
Client reconnects to /tmp/chg${UID}/server-${HASH} and repeats the handshake steps
- Servers never restart, they just die if inactive for long
Socket path includes hash(__version__ + pathto(__version__.py)), hash(user_config.unloadable_sections), hash(unloadable_env), and hash(repo_config.unloadable_sections) if any
client server#0 server#1 |-connect()>x : # no server available |-spawn()-->| : # start server : |listen() : # at server-${HASH#0} : |symlink() : # server -> server-${HASH#} |-connect()>|accept() : | |fork() : |-send()--->|calc_hash(): # send env+argv, and calculate hash |<---send()-| : # if hash differs, suggest socket path : : : |-connect()-|---------->x |-spawn()---:---------->| : : |listen() # at server-${HASH#1} |-connect()-:---------->|accept() | : |fork() |-send()----:---------->|calc_hash() # send env+argv, and calculate hash |<----------:----send()-| # if hash matches, tell ok |-send()----:---------->|runcommand() # request run command ...
5.1.6. Transparent restart
Restart visible to outside is error prone consider multiple restarts happening in parallel, the downtime, client side wait and retry, locks etc.
Want to continue serve chg clients during restart. Ideally the restart is transparent to client. This seems to be possible if we transfer file descriptors from the old to the new process.
The restart process looks like:
- Server process A listening. Forked process B handling the request detects files change.
B forks and execs a new server with correct environ and commandline arguments, then pass two fds, one is the socket listening, the other is the one returned by accept.
- The new server initialized itself (preloading extensions) and forks immidately to handle the request.
- The new server takes over the unix socket.
- The old server ends itself (see below).
To make the unix socket file always available to clients, and work correctly even when multiple restarts happen at the same time, do something like:
Instead of bind(server_address), bind to a temporary, unique address such as server_address + pid and then do a rename. Therefore no downtime. This also allows us to do 2 (see below).
- Periodically check if the socket file is owned by current process (by checking inode). If not, exit.
This efficiently handles parallel restarts lock-free cleanly at a little cost of CPU and filesystem.
Parent process of new worker:
Once the old server ends at (5), the parent of the new server will be init (pid=1). This means "transparent restart" only works for a daemon.
5.1.7. How to detect config change?
A. keep hashes of all config files and compare them:
hook ui.readconfig -> config.parse to know all involved files
- keep full text or hash of these files
- read all config files and compare them with (2) per connection
https://bitbucket.org/yuja/chg/pull-requests/3/483b35203d92/diff#comment-8188548
We can't know if chg server is about to start when config files are loaded, so it would have to be always enabled.
B. (another idea)
always recreate ui
request to restart the server if extensions are changed
5.1.8. How to handle Environment Variables difference
Some of them are sent by the client and updated for each worker. (eg: HGEDITOR)
Some other have global effect and requires dispatch to different server. (eg: HGPLAIN*, HGENCODING*, HGRCPATH, LANG, LANGUAGE, LC_*)
5.1.9. Who restarts the server? (previous discusion)
- this would be deeply linked to the server model:
fork per connection, or pre-forked worker pool (for PyPy JIT)
round-robin on accept(), or more intelligent dispatcher (e.g. dispatch per repo.root for better caching of repo instance)
- but we'll need a simple implementation first. otherwise we can't run tests!
(from previous discussion)
- marmoute, lcharignon: server tells dirty and dies, client starts the server ?
- yuya: server tells dirty, client kills and restarts the server
(from Dec. 18, 2015 with marmoute, lchrignon, junw)
- start multiple servers per hash (see below)
- the hash is calculated at the client
- server listens to a unix socket, whose path includes the hash
- server does gc: kills itself after being idle for long
- therefore no restart needed
things like filesystem race and __version__.py makes it actaully not trivial
5.1.10. See also
https://bitbucket.org/yuja/chg/pull-requests/3/483b35203d92/diff#comment-8219394
http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/81763/focus=82134
5.2. Forking Model
Need long-lived workers for better JIT optimization and caching of repo objects.
cHg was originally a pre-forking server, which worked as follows:
master bind() and listen() on shared socket
master fork() pre-configured number of workers
each worker accept() connection (round-robin)
https://bitbucket.org/yuja/chg/src/prefork/hgext/chgsupport.py
There were two major issues:
- client stuck if no idle worker available
- global space could be tainted by running command, for example:
hg unknown-command loads all extension modules and break things (iirc, "factotum" or "inotify" changed the global socket timeout value and made chg failing)
commands or extensions could be loaded from .hg/hgrc
(a) can be solved by master-worker channel:
worker tells master it is busy on accept()
master fork() one more worker if no idle worker available
(b) was solved by terminating worker if unwanted changes were detected.
5.3. Random Thoughts
- eliminate copy-paste codes
- pager
_requesthandler
util.system
testing: ./run-tests.py --with-hg=chg ?
- environment variables
- shell alias
- debian package