Note:
This page is primarily intended for developers of Mercurial.
Performance tracking infrastructure
Status: Project
Main proponents: Pierre-YvesDavid PhilippePepiot
This is a speculative project and does not represent any firm decisions on future behavior.
Provide a continuous integration infrastructure to measuring and preventing performance regressions on mercurial
1. Goal
Mercurial code change fast and we must detect and prevent performances regressions as soon as possible.
- Automatic execution of performance tests on a given Mercurial revision
- Store the performance results in a database
- Expose the performance results in a web application (with graphs, reports, dashboards etc.)
- Provide some regression detection alarms with email notifications
2. Metrics
We already have code that produce performance metrics:
- Commands from the perf extension in contrib/perf.py
- Revset performance tests contrib/revsetbenchmarks.py
- Unit test execution time
Another idea is to produce metrics from annotated portions of unit test execution time.
These metrics will be used (after some refactoring for some of the tools that produce them) as performance metrics, but we may need some more specifically written for the purpose of performance regression detection.
3. Expected Results
Expected results are still to be discussed. For now, we aim at having a simple tool to track performance regressions on the 2 branches of the main mercurial repository (stable and default).
However, there are some open questions for mid-term objectives:
- What revisions of the Mercurial source code should we run the performance regression tool on? (public cs on the main branch only? Which branches? ...)
- How do we manage the non-linear structure of a Mercurial history?
- What kind of aggregations / comparisons de we want to be able to do? Should these be available through a "query language" or can they be hardwritten in the performance regression tool?
4. Existing tools
Airspeed velocity
used by the http://www.astropy.org/ projects
Presentation (2014): https://www.youtube.com/watch?v=OsxJ5O6h8s0
Python, Javascript (http://www.flotcharts.org/)
This tool aims at benchmarking Python packages over their lifetime. It is mainly a command line tool, asv, that run a series of benchmarks (described in JSON configuration file), and produces a static HTML/JS report.
When running a benchmark suite, ASV take care of clone/pulling the source repository in a virtual env and running the configured tasks in this virtual env.
Results of each benchmark execution are stored in a "database" (consisting in JSON files). This database is used to produce evolution plots of the time required to run a test (or any metrics; out of the box, asv has support for 4 types of benchmark: timing, memory, peak memory and tracking), and to run the regression detection algorithms.
One key feature of this tool is that it's very easy for every developer to use it on its own development environment. For example, it provides an asv compare command allowing to compare the results of any 2 revisions.
However, ASV will require some work to fit the needs:
- The main drawback with ASV is the fact it's designed with commit date as X axis. We must adapt the code of asv to properly handle this "non-linearity" related to
dates (see https://github.com/spacetelescope/asv/issues/390)
- Tags are displayed in the graphs as a secondary x axis labels, and are related to commit date of the tag; these should be displayed as annotations of the dots instead.
- Implement a notification system
Demo build with a patched ASV to workaround dates, branch and tags issues
https://hg.logilab.org/review/hgperf/raw-file/5aee29f2aee0/index.html
Regression on perftags benchmark from contrib/perf.py: https://hg.logilab.org/review/hgperf/raw-file/5aee29f2aee0/index.html#benchmarks.track_perf_perftags?branch=default&time=28264-28265
Other tools where evaluated a complete report is available in https://hg.logilab.org/review/hgperf/raw-file/tip/docs/tools.html