Differences between revisions 1 and 14 (spanning 13 versions)
Revision 1 as of 2015-10-31 08:57:05
Size: 1940
Comment:
Revision 14 as of 2016-01-16 10:52:42
Size: 7550
Comment: add known issues
Deletions are marked like this. Additions are marked like this.
Line 6: Line 6:
Steps to merge [[https://bitbucket.org/yuja/chg/|cHg]] into the Mercurial tree. Steps to merge [[https://bitbucket.org/yuja/chg/|cHg]] into the Mercurial tree,
plus possible future improvements
.
Line 10: Line 11:
== How to Start ==

Perhaps (2) is the best.

 1. single big commit
== How to Merge ==

Perhaps (B).

 A. single big commit
Line 16: Line 17:
 2. reorganize as 10+ patches (base implementation, pager, setenv, sendfds, ...)   * {o} useless history for digging
 A
. reorganize as a few patches (base implementation, pager, setenv, sendfds, ...)
Line 18: Line 20:
 3. pull, rename and merge
  * {o} messy
 4. convert with filemap and rebase
  * {o} messy
  * {o} less useful history for digging
 A. pull, rename and merge
  * {o} having more than one roots
 A. convert with filemap and rebase
  * {*} full history is useful when digging into bugs
  * {o} 300+ uninteresting revisions
Line 25: Line 29:
Perhaps (A) is better because we don't have to worry about the extension path. Perhaps (A) or (B) because we don't have to worry about the extension path.
Line 39: Line 43:
A. Merge extension part: A. Merge extension part into hgext:
Line 51: Line 55:
B. Put everything under contrib/chg: B. Merge extension part into core:

{{{
contrib/chg/Makefile
            README
            chg.c
            hgclient.[ch]
            util.[ch]
mercurial/chgserver.py
mercurial/osutil.c <- chgutil.c
}}}

C. Put everything under contrib/chg:
Line 73: Line 89:
 * move variable declarations to top (really?)

== Random Thoughts ==
 * --(move variable declarations to top (really?))-- <<BR>>
 this appears non-trivial, I won't do for contrib/chg sources

== Future Improvements ==

=== Server Life-cycle ===

Want to restart the server (or reload the config cleanly) if `config`
or `__version__` changed.

''On config change'', it isn't always necessary to restart the server. For
example, `ui.style` can be reloaded by recreating `ui` object. On the other
hand, extensions can't be unloaded. Anyway, it won't be a problem to restart
the server aggressively assuming that the config files won't change too often.

''Environment variables'' are also a problem. The two things above change in a "global event" basis. At some point of time, the config/version changes and all running servers need to be restarted. However, Environement variable can change more often and can be different in two different shell, using Mercurial at the same time. We probably need multiple server based on environment here.

==== How to detect config change? ====

A. keep hashes of all config files and compare them:

 1. hook `ui.readconfig` -> `config.parse` to know all involved files
 2. keep full text or hash of these files
 3. read all config files and compare them with (2) per connection

https://bitbucket.org/yuja/chg/pull-requests/3/483b35203d92/diff#comment-8188548

/!\ We can't know if chg server is about to start when config files are loaded,
so it would have to be always enabled.

B. (another idea)

 1. always recreate `ui`
 2. request to restart the server if `extensions` are changed

C. select socket per `hash(configs + environ)`

{{{
<junw> so can we do a server per config?
<marmoute> If we use a hash per "env"
<junw> the config here means the hash of config files, environment variables
<marmoute> (env contains config)
<junw> yes. i suggest one hash for all
<marmoute> When "env" changes
<marmoute> client will look for a different socker, found none, spin a new server
<marmoute> The old server is still running
<marmoute> We want it to run
<junw> the server can gc itself.
<junw> say if it is idle too long, it kills itself
<marmoute> because two clients with different "env"
<marmoute> We want GC it eventually.
}}}

==== How to handle Environment Variables difference ====

Some of them are sent by the client and updated for each worker.
(eg: `HGEDITOR`)

Some other have global effect and requires dispatch to different server.
(eg: `HGPLAIN*, HGENCODING*, HGRCPATH, LANG, LANGUAGE, LC_*`)

==== Who restarts the server? ====

 * this would be deeply linked to the server model:
  * fork per connection, or pre-forked worker pool (for PyPy JIT)
  * round-robin on `accept()`, or more intelligent dispatcher (e.g. dispatch
  per `repo.root` for better caching of `repo` instance)
 * but we'll need a simple implementation first. otherwise we can't run tests!

(from previous discussion)

 * marmoute, lcharignon: server tells dirty and dies, client starts the server ?
 * yuya: server tells dirty, client kills and restarts the server

(from Dec. 18 with marmoute, lchrignon, junw)

 * start multiple servers per hash (see below)
 * the hash is calculated at the client
 * server listens to a unix socket, whose path includes the hash
 * server does gc: kills itself after being idle for long
 * therefore no restart needed :)

===== The hash =====

 * hash(config files + environment variables)
   * the user can use different configs mixed with different environment variables in parallel
   * the server does not to read config files per request
   * need to implement lighweight config parsing in C
   * filesystem race can occur between client and server starts
     * fix: the server need to re-verify the hash. probably just re-use the C code first, config files are read twice but only on start-up.

==== See also ====

 * https://bitbucket.org/yuja/chg/pull-requests/3/483b35203d92/diff#comment-8219394
 * http://thread.gmane.org/gmane.comp.version-control.mercurial.devel/81763/focus=82134

=== Forking Model ===

Need long-lived workers for better JIT optimization and caching of `repo`
objects.

cHg was originally a pre-forking server, which worked as follows:

 1. master `bind()` and `listen()` on shared socket
 2. master `fork()` pre-configured number of workers
 3. each worker `accept()` connection (round-robin)

https://bitbucket.org/yuja/chg/src/prefork/hgext/chgsupport.py

There were two major issues:

 a. client stuck if no idle worker available
 a. global space could be tainted by running command, for example:
  * `hg unknown-command` loads all extension modules and break things (iirc,
  "factotum" or "inotify" changed the global socket timeout value and made chg
  failing)
  * commands or extensions could be loaded from `.hg/hgrc`

(a) can be solved by master-worker channel:

 1. worker tells master it is busy on `accept()`
 2. master `fork()` one more worker if no idle worker available

(b) was solved by terminating worker if unwanted changes were detected.

=== Known Issues ===

 * `serve --daemon` not work
 * `--time` is noop
 * extensions loaded by `{repo}/.hg/hgrc` can't be found by extensions loaded
   globally
 * `^Z` does not stop the server (needs handling of `SIGTSTP` and `SIGCONT` ?)
 * see also [[CommandServer]]

=== Random Thoughts ===
Line 83: Line 231:
  * `HGPLAIN`, `HGENCODING`, `LC_*`, etc.
  * `ui.environ` ?
  * [[https://selenic.com/pipermail/mercurial-devel/2009-November/016754.html|ui.environ]] ?
Line 87: Line 234:
 * better forking model for
  * JIT optimization provided by pypy
  * cached `repo` object (avoid parsing obsstore, etc.)
 * restart server process if `config` or `__version__` changed

Note:

This page is primarily intended for developers of Mercurial.

cHg Porting Plan

Steps to merge cHg into the Mercurial tree, plus possible future improvements.

1. How to Merge

Perhaps (B).

  1. single big commit
    • {o} hard to review

    • {o} useless history for digging

  2. reorganize as a few patches (base implementation, pager, setenv, sendfds, ...)
    • {*} can improve the code incrementally

    • {o} less useful history for digging

  3. pull, rename and merge
    • {o} having more than one roots

  4. convert with filemap and rebase
    • {*} full history is useful when digging into bugs

    • {o} 300+ uninteresting revisions

2. Source Layout

Perhaps (A) or (B) because we don't have to worry about the extension path.

Original:

README
hgext/chgserver.py
      chgutil.c
src/Makefile
    chg.c
    hgclient.[ch]
    util.[ch]

A. Merge extension part into hgext:

contrib/chg/Makefile
            README
            chg.c
            hgclient.[ch]
            util.[ch]
hgext/chgserver.py
mercurial/osutil.c <- chgutil.c

B. Merge extension part into core:

contrib/chg/Makefile
            README
            chg.c
            hgclient.[ch]
            util.[ch]
mercurial/chgserver.py
mercurial/osutil.c <- chgutil.c

C. Put everything under contrib/chg:

contrib/chg/hgext/...
            src/...

3. Coding Style

Current state:

  • .py mostly follows the Mercurial style

  • .c is similar to the Mercurial style, but

    • uses 4 spaces instead of tab
    • uses C99 comment
    • uses C99 variable declaration

What to do:

  • update import lines (easy)

  • replace 4 spaces by tab (easy)
  • replace C99 comments (easy)
  • move variable declarations to top (really?)
    this appears non-trivial, I won't do for contrib/chg sources

4. Future Improvements

4.1. Server Life-cycle

Want to restart the server (or reload the config cleanly) if config or __version__ changed.

On config change, it isn't always necessary to restart the server. For example, ui.style can be reloaded by recreating ui object. On the other hand, extensions can't be unloaded. Anyway, it won't be a problem to restart the server aggressively assuming that the config files won't change too often.

Environment variables are also a problem. The two things above change in a "global event" basis. At some point of time, the config/version changes and all running servers need to be restarted. However, Environement variable can change more often and can be different in two different shell, using Mercurial at the same time. We probably need multiple server based on environment here.

4.1.1. How to detect config change?

A. keep hashes of all config files and compare them:

  1. hook ui.readconfig -> config.parse to know all involved files

  2. keep full text or hash of these files
  3. read all config files and compare them with (2) per connection

https://bitbucket.org/yuja/chg/pull-requests/3/483b35203d92/diff#comment-8188548

/!\ We can't know if chg server is about to start when config files are loaded, so it would have to be always enabled.

B. (another idea)

  1. always recreate ui

  2. request to restart the server if extensions are changed

C. select socket per hash(configs + environ)

<junw> so can we do a server per config?
<marmoute> If we use a hash per "env"
<junw> the config here means the hash of config files, environment variables
<marmoute> (env contains config)
<junw> yes. i suggest one hash for all
<marmoute> When "env" changes
<marmoute> client will look for a different socker, found none, spin a new server
<marmoute> The old server is still running
<marmoute> We want it to run
<junw> the server can gc itself. 
<junw> say if it is idle too long, it kills itself
<marmoute> because two clients with different "env"
<marmoute> We want GC it eventually.

4.1.2. How to handle Environment Variables difference

Some of them are sent by the client and updated for each worker. (eg: HGEDITOR)

Some other have global effect and requires dispatch to different server. (eg: HGPLAIN*, HGENCODING*, HGRCPATH, LANG, LANGUAGE, LC_*)

4.1.3. Who restarts the server?

  • this would be deeply linked to the server model:
    • fork per connection, or pre-forked worker pool (for PyPy JIT)

    • round-robin on accept(), or more intelligent dispatcher (e.g. dispatch per repo.root for better caching of repo instance)

  • but we'll need a simple implementation first. otherwise we can't run tests!

(from previous discussion)

  • marmoute, lcharignon: server tells dirty and dies, client starts the server ?
  • yuya: server tells dirty, client kills and restarts the server

(from Dec. 18 with marmoute, lchrignon, junw)

  • start multiple servers per hash (see below)
  • the hash is calculated at the client
  • server listens to a unix socket, whose path includes the hash
  • server does gc: kills itself after being idle for long
  • therefore no restart needed :)

4.1.3.1. The hash
  • hash(config files + environment variables)
    • the user can use different configs mixed with different environment variables in parallel
    • the server does not to read config files per request
    • need to implement lighweight config parsing in C
    • filesystem race can occur between client and server starts
      • fix: the server need to re-verify the hash. probably just re-use the C code first, config files are read twice but only on start-up.

4.1.4. See also

4.2. Forking Model

Need long-lived workers for better JIT optimization and caching of repo objects.

cHg was originally a pre-forking server, which worked as follows:

  1. master bind() and listen() on shared socket

  2. master fork() pre-configured number of workers

  3. each worker accept() connection (round-robin)

https://bitbucket.org/yuja/chg/src/prefork/hgext/chgsupport.py

There were two major issues:

  1. client stuck if no idle worker available
  2. global space could be tainted by running command, for example:
    • hg unknown-command loads all extension modules and break things (iirc, "factotum" or "inotify" changed the global socket timeout value and made chg failing)

    • commands or extensions could be loaded from .hg/hgrc

(a) can be solved by master-worker channel:

  1. worker tells master it is busy on accept()

  2. master fork() one more worker if no idle worker available

(b) was solved by terminating worker if unwanted changes were detected.

4.3. Known Issues

  • serve --daemon not work

  • --time is noop

  • extensions loaded by {repo}/.hg/hgrc can't be found by extensions loaded

    • globally
  • ^Z does not stop the server (needs handling of SIGTSTP and SIGCONT ?)

  • see also CommandServer

4.4. Random Thoughts

  • eliminate copy-paste codes
    • pager
    • _requesthandler

    • util.system

  • testing: ./run-tests.py --with-hg=chg ?

  • environment variables
  • debian package


CategoryDeveloper CategoryNewFeatures

ChgPortingPlan (last edited 2016-12-07 23:26:33 by JunWu)