hg 4.4 sprint notes
Taken from: https://public.etherpad-mozilla.org/p/sprint-hg4.4-NOSPAMREMOVETHATLASTPART (drop the anti spam part)
State of the community: Things have not fallen apart without Matt Need more reviewers Give feedback about phabricator State of the tooling: experiment with phabricator for a couple of months we have a dashboard for stacks buildbot now runs Python3 windows builds are running as well, not green yet idea: letting buildbot automatically run tests on patches State of the code: charts >100.000 LOC Python ~20.000 LOC C State of Hg at Facebook: Emphasis: Perf treemanifest: 2x faster commits, 5x faster show; huge win for large repos doubled commit throughput by minimizing critical sections restack workflow fully deployed, everyone has obsmarkers on (restack, next --rebase, hide, unhide, split, fold, uncommit, unamend - commands fully deployed) Mercurial for Windows rolled out to some users shipped LFS extension, which does not change the hashes in production already upstreamed extensions (thanks Pulkit!): copytrace, directaccess, uncommit, unamend, pushvars, commitextras, morestatus upcoming: want to upstream remotenames, pushrebase, infinitepush demos for later: "hg undo", "hg workspaces" big projects in progress: Mononoke, talk to Sid, written in Rust Eden, FUSE FS, talk to Adam, begun to integrate with Mercurial Future: reaching the bottom of the CPU perf pile IO dirstate, lazy changelog in-memory operation last-mile perf (1s -> 0s) python is a bottleneck stateful chg integrations with Eden/Mononoke ease of use in native code need well defined APIs Comments: a lot of upstreamed stuff is not extensions, but an opt-in part of the core State of Hg at Unity: still use hg at unity Mads does not work directly on hg main repo, WC = 6gb of large files (largefiles extension), 1 gb normal files, .hg ~6Gb also using generaldelta, aggressive deltas Problems: huge manifest, because of merge workflows discovery is very expensive because of that (>4K named branches) push to using Git, a lot of things are moving into smaller repos evolve is tried, but not yet deployed State of hg at Mozilla: repo have grown: 400.000 files problems: scaling a monorepo number of files in a checkout clone size making monorepo feel small hg configs mercurial default configs is not suitable for our scale difficult to tell people to install different extensions (fsmonitor) windows performance recent developments sparse checkouts in CI for ~2 months (using core sparse) devs don't use sparse because UX is not perfect full clones starting to takte too long tensions between monorepo w/ Hg and microrepos w/ GitHub. Not getting GitHub contributors is a bad side of a monorepo. State of hg at Google: 2.5% of commits at Google is done via Mercurial centralized configuration external extensions: narrowhg, remotefilelog, full evolve internal extensions: trainingwheels (smoothing piper cmd -> hg cmd), bugreport (copy problems to someones server), fix (running tools at commits before you run them), srcfs/citc - virtual FS, codereview extension hg xl - similar to smartlog model: linear history hg push starts a review, does not land a commit no merges not encouraging bookmarks/named branches/topics hg is ran on top of a virtual FS which knows nothing about it automatic update of the narrowspec Rust session: Run some Rust in the default codebase as a week-end objective Port the EXE wrapper to RUST Rust has 2 compilers on Windows (we need MSVC) Problem with Rust and C compiler (Rust is built with modern MSVC, Python27 is built with ancient MSVC 2008, they wouldn't link), EXE wrapper is not a good idea Ensure that Mercurial can still be packaged on Debian, no need to use specific Rust version Rust-Cpython uses the CPython API (project is dead) Bridging-code between cffi and Rust, not cpython would be a good start If we use Rust like we used C before Core of Rust code independent from the bridge Fuzz all the Rust code Push a Rust extension? Or translate a C part into Rust and compare the perf ? Having Rust doesn't remove the pure-Python Code. CFFI could helps Python 3 migration The issue with CFFI is to slow down data passing Zero-Copy might be possible but not on very old python 2.7 versions Write our own layer between Python and Rust? (instead of using pyo3 https://github.com/PyO3/pyo3) Obs-store parser in Rust is 100x times faster than Python C (but 2-3x slower once interacting with rust-cpython) Rewrite dirstate in Rust? Buildot builders needs a Rust compiler, a modern Rust compile. Rust could call C code if it's decoupled from CPython code Shipping some hooks with the distribution is a good idea! mpatch.c should be easy to rewrite in Rust, might be a Good Start How to link libhg with other tools would requires less strong GPL, would the Community agree? - Contributer License Agreement for specific directories (Facebook has its own ConfigParser)? Maybe rewrite the ConfigParser in Rust Also the CLI option parser might be a good candidate bdiff.c is also a good candidate => Send example hooks => Code Review Session: https://patchwork.mercurial-scm.org/project/hg/list/ https://patchwork-demo.mercurial-scm.org/project/hg/list/ https://phab.mercurial-scm.org/yadda/ Phabricator enjoyed by people at large companies Yuya: emails from Phabricator are hard to read Hard to keep track of unread state on Phabricator No context in Yadda Augie: Phabricator would require too much UX modifications and it wouldn't be Phabricator anymore Ryan: email doesn't make it obvious what next actions are; Augie: that's on reviewer/comment Augie is ~20% less efficient with Phabricator If we had more reviewers, this becomes less of a problem in Augie's opinion. Kevin: it would be helpful to have a list of criteria we need in review tool; he has a list and will turn into wiki page Augie wants threaded emails and quoted context inline; comment/context goes all the way back to the original line that was commented on Patchwork 2.0 has support for custom keywords on reviews! Feed Phabricator improvement ideas to Ryan; prioritize them Some criteria we've discussed before: Side-by-side comparison vs. unified diff Good-quality, terse e-mails Publicly archived Read state — catch up to prior state on existing discussion Single source of truth — clear status feedback from reviewer Syntax highlighting Word-enhanced diffs Context of full series vs. context of old versions of series Other nice things: Automatic linting Automatic test runs/CI feedback Suggestions for Phabricator upstream: Remove the repository name Maybe remove to/cc part also from the bottom Kevin: we should commit to maintain bug tracker more actively Add donotreap tag on Bugzilla Daily report generated at https://hgbzstats.octobus.net/, could be configured to send email (weekly?) gregory/ryan: Proposal to use Phabricator issue tracker https://phabricator.wikimedia.org/T259 contains script to migrate from Bugzilla to Phabricator redundant personal and mailing list emails is annoying => Activate hgbzstats email weekly Restack session: - evolve / restack - hg rebase --restack == hg evolve --all, but uses a core command - hide commits - strip by default is bad - like prune but with unhide - unhide with same hash - exchange - uncommit / unamend / split / fold / next --rebase - understanding obs history => shipping obs-markers by defaults => ship hide / unhide commands => ship obs cycle for unhide Dirstate session: Well defined API: - Ability to experiment - Scalability / perf - file status - working directory parent - current branch narrowspec get updated every time a file / directory is touched, which "solves" the perf issue hg grep to search the whole codebase durham: tree dirstate data structure, make the dirstate more incremental ------------------ Saturday ----------------- ----- Demos ------ Interactive smartlog: - Nuclide plugin that lets one drag/drop nodes in the Nuclide IDE's GUI to do rebases - shows current status (if watchman is enabled) - shows 'status' kind of results, lets you create a commit from them - shows you the changes you're going to make, lets you confirm (instead of doing it automatically) - Conflict resolution - uses Nuclide's conflict resolution if automerge fails - Pull, etc. Benchmarks: Able to bisect to find regression, has very pretty graphs :) http://perf.octobus.net/ Slides (in french sorry) we gave about the tool: https://octobus.net/presentations/perf_test.html#/nos-rsultats bitbucket.org/octobus/bighgperf/src Can we get it into the review tooling? Does it make sense to merge with the existing performance/benchmark suite? Existing benchmarks are available at https://buildbot.mercurial-scm.org/speed/#/ ------ Sessions ------ Discussion session 1: Company self-review: It would be nice if companies that are doing lots of changes can do a first-pass review (not landing!) of code from coworkers at the same company. durin42: "obvious bugfix" - author and pusher can be from the same company. New features, use judgement Reviews are always good, it's mostly about pushing/landing. Anything can be backed out as long as not in a release. There have been concerns about big companies "hijacking" the mercurial community Steering committee has a limit of number of members from a single company (35% max) => Durham: The current status quo is fine. What about putting Facebook extensions and Google extensions in core subdirectory. Concerns about adding maintenance burden to contributors - changing something in mercurial/foo.py that breaks contrib/some_extension - who is supposed to fix that? Backup file madness: Better solution for the way that hg "backs up" files during merges Currently appends ".orig" to the file, which can conflict with legitimate files in the tree, can overwrite the file that's already there! There is an option to move these files to a different directory, can we enable it by default? Concerns: makes the .orig files less discoverable. Maybe print out "backed up to <foo>" FB's tweakdefaults extension: switch to updatecheck=noconflict? (or was this a request to have this in some "upstream" tweakdefaults?) - does it make sense to base clobbering .orig on something like this? Action item: should move experimental.updatecheck -> commands.updatecheck Action item: Change the Mercurial default from 'merge' to 'noconflict' or 'abort' (abort is maximally safe, but might be more annoying; noconflict is closest to git?) Skip blame/annotate: Experimental flag to "skip" certain revisions, takes a revset. Currently in hg today At Mozilla, some people really negative on idea of code-reformatting-only commits because it makes blame less useful/accurate They think reformating code is hostile to code archeology Idea came from Chromium/git hyperblame (https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html) Perhaps should allow a file that has a list of revsets that describe revisions that should be skipped, instead of requiring it to be specified on the commandline by all users - maybe not automatic, but using some syntax to specify a file (see below) Possible security concerns - could put a tag in the description, and hide who actually put a security flaw in there Could put an indicator in the blame output that the author/node shown is not the *real* author/node hg annotate <path> --skip 'file(.hgskipblame)' Mononoke presentation: infinitepush automatically push any commit Needs: - Local commit backups - API for services, give me revisions between X and Y - Support for Eden (Virtual file-system) - Geographic redundancy, automatic failover Mononoke is the response to these needs It is not: - A replacement for Mercurial - A reimplentation of Mercurial - A reimplentation of Mercurial Client It is a reimplementation of Mercurial Server Design: - Immutable data => Stored in blog store - Append-only (metadata) => Mysql - Mutable data (bookmark) => Mysql Mononoke servers can pull from the central source of truth and cache a lot of things Mononoke has both a Mercurial front and an API frontend Backend is pluggable Mononoke source code is available on github https://github.com/facebookexperimental/mononoke Mercurial-bundles/src, mercurial-types, mercurial/src could be imported into core Tested with fuzzing (quickcheck) Mononoke has code to convert repo to mononoke format (blobimport.rs) which is multithreaded Mononoke would not allow serving revlog-based repos Only ssh, Facebook want to provides HTTP interface Initial API and provides GraphQL API maybe later No server-side extension in Python would works but Mononoke would provide server-side hooks written in Lua Pub-Sub extension for Mononoke Abstract the server (hg serve) ran in tests so mononoke could be tested against core test suite In-repo config break-out session [Greg] Problem space: hg out the box not very useful to power users Generally per-repo best practices you want people to follow Large companies can deploy configs to client machines, but doesn't work in open source. Problem at Mozilla for example. Ideally: define config requirements in the repo itself No programmatic way to upgrade user's .hgrc file Matt Mackal OK with e.g. .hg/hgrc-auto that was under control of the repo itself There's a server config extension that does some of this; was demoed at the sprint in Paris 1-2 sprints ago We think the extension might still be around, but not part of core : https://www.mercurial-scm.org/wiki/ConfigExpressExtension Key deliverable of this session: way for repo to define desired/required user client config [Sid] Trust and security issues: current stance is, mercurial itself shouldn't be able to run arbitrary code without user's permission. Maybe a bit too cautious? Hg kind-of has this permission anyway - at least, it has permission to dump files on your disk which you may then execute (e.g. ./configure) without first inspecting the files. You already trust Mercurial. [Mark] So if pulling over a secure connection, UX could be: "this server has suggested config, would you like to apply it? [y/N]" Revset aliases: today they can override built-ins (all, parents, etc). Something to be aware of. client should reject built-in revsets if provided from server/repo [Kyle] Do you want configs to be versioned? e.g. do you want to lose aliases every time you update to an old revision? Maybe not. .hg/hgrc not versioned, so if we're talking about something like .hg/hgrc-server it would be weird/unexpected if that was versioned. [Mark] Were talking last night about having a separate meta-repo. Doesn't have same commit history as the working copy. [Sid] Maybe a separate branch [Greg] if crazytown: have it in the repo, have a sparse checkout of it in .hg (?) [Mark] Have a special file in .hg that is essentially its own filelog, not references in manifest, maintains its own history. [Greg] Generally lack a mechanism for associating random metadata with a repo [Mads] Unity does this with a global config store (repo?), can nag if someone's not on a recent-enough version [Boris] did you looks into what the config express extension does ? it handle multiple of these problems already [Boris] https://www.mercurial-scm.org/wiki/ConfigExpressExtension [Boris] Used at nokia for 9 month, Unity testing it. [Sid] Smallest thing we could do: have a set of safe revset aliases shipped with the repo, composed of safe primitives (where safe is TBD) [Sid] Shouldn't be sent only at clone time, should be sent also at pull time Having a meta-repo that stores all this makes sense. Probably don't want it versioned as such (don't want to go back to earlier config on updating to earlier commit) - [Greg] but probably do want minimum version (prompt to update the meta-repo if current version is older than current low water mark) Inside meta-repo we have a single head (@), gets cloned/pulled automatically whenever you interact with the canonical repo. What's in the actual repo? i.e. in its working copy? [Mads] A single hgrc file? [Greg] Should probably be one config per feature. [Mads] single file makes it easier to extend / add new features [Sid] Should we also have a path for users to customise these configs? Should a user be able to tweak these, or should they just override in their own hgrc? [Kyle] I imagine this as basically being implemented as a %include as the first line of .hg/hgrc, so anything later in the .hg/hgrc would override. [Sid] that's how FB does things today [Sid] So how do users make changes? [Greg] It's just a plain old repo, so people can just cd into it and change stuff [Greg] What if people don't want certain bits of the metarepo config? [Greg] should we load up configs in the order global -> metaserver -> user -> repo? [Mads] Should ensure we figure out the state of prior art first. [Sid] vim modelines have had security vulns; need to make sure we don't reproduce this problem [Mads] Worth pointing out: all the good/most useful use cases we can think of are also the most scary ones. [Sid] Need a security review / input from people who understand how to find security vulns [Kyle] Could detect if proposed config varies from existing config, prompt the user to accept [Sid] People generally bad at understanding security stuff. This doesn't sound like the right defence. [Rodrigo] Is there any kind of sandboxing that would make sense and still make this useful? [Sid] Probably this belongs in the build tool. Maybe we can produce a standard tool that people can incorporate into their builds. [?] How does this make it more secure? Instead of having hg incorporate the config automatically you get the user to run make and *that* updates the config. Not more secure. [Sid] But running ./configure you fully understand that it could own your system. It makes the security risk more explicit. [Rodrigo] Simpler suggestion: Having a way for the server to send an output part at the beginning of clone to give the user instructions/suggestions. Maybe from a .hgwelcome file? [Greg] Will try to hack that together. Remote name session: Augie: Bookmark exchange is awfull, in the whole history, 50 repository have at least one bookmark, in Bitbucket 0.5% have at least one bookmark They are usefull locally but painfull when exchanged Durham: remotenames is now slightly too big, it's not reasonable to get everything into the core. Proposal for upstreaming: - remote bookmarks, but not branches - disabling traditional bookmark exchange - a bunch of revsets for accessing remotenames - remote name hoisting (type master and have it resolved to "default/master") - a bunch of 'hg push' improvements [controversial, Augie does not agree with push upstreaming, it has a low priority] -- hg push --to: only push to one remote bookmark name -- hg push --create, --non-forward-move, --delete - all useful instead of generic --force - it's not just UX, has perf wins, because it allows to speed-up heads discovery High level plan: - clients keep track of remotenames - clients allow users to opt-in to disabling normal bookmark syncing - servers allow to opt-out of serving local bookmarks Augie: we should also at some time think about tracking, but it's probably worth delaying it. Kevin: conceptually bookmarks should be: interesting heads from the server and we can hint server that some interesting head should be advanced. Local and remote bookmarks namespaces should be separate, so that if one wants to push a local bookmark foo to a remote bookmark foo, they do something like "hg push -r foo -B foo" Durham: - it is uncontroversial to upstream the storing of information, expose revsets/log templates - breaking the local bookmark syncing: adding a new behavior behind a flag, announce that the default is to be flipped in 1 release. Potentially print warning if bookmarks are exchanged and neither opt-in, nor opt-out config flag is set. upstream() bookmark is roughly ' or '.join('::%n' % node for node in itertools.chain([allheads(path) for path in paths]) hg push wip#foo Durham: argued for deprecating "hg push --bookmark / -B", and using --onto instead of -B. Kevin/Augie largely okay with --onto; don't really like it but prefer it over --to discussion over whether -d/--destination would be better than --onto, and whether we command line flags should always be nouns rather than prepositions Upstreaming NarrowHg, Sparse, Lazy changelog session: Sparse: Durham: Sparse is core, how do we rename debug_sparse. * Greg: Dirstate needs to be sorted out before. He moved most over, monkeypatched dirstate is outstanding. May also need a new status flag for sparse files (hg purge does not work correctly), would need help from someone who knows this part of the code better hg purge --all should remove what is not in the sparse Durham: these files should be deleted during "sparsification" Kyle: mercurial would be sitting on piper, new state should not list millions of files Durham: maybe new flag: materialized outside sparse Ryan: terse may help, Kyle: Narrow is implemented in a way: override dirstate.walk - do not even go in there Adam: had cases when dirs with hundreds of thousands files were still there, but should not have been because of sparse. These were also stale because they were not updated Augie: Tombstone in dirstate: either unknow or ignored, or do not look Agreement: we would need to have a new status. This is not part of dirstate, but as an output only In the short term there are cases where we need to walk outside sparse: e.g. purge Google restricts what can be walked Durham: in the Eden world no need for sparse. We would disable things like hg files. IDEs may be an issue, IDEs need too much. Given the IDEs do not support mercurial much, we would need to make changes to them anyway Kyle: Eclipse, VS Code Back to sparse, UX: The problems: Greg: Monorepo at Mozilla wants end to end workflow. Clone, indicating working on a subset of the repo, would be nice if Hg web knew what sparse profiles are available, would like to show sparse profiles as if they were there own individual repos. Fine to see the full history. Augie: parts of narrow should not land yet (ellipsis holepunching, which have issues), Would like to support a sparse checkout of a full clone. Could have more filelogs than working copy Ryan: discoverability, we have been looking at this. Few thoughts: either a structured field or comment (our sparse profiles support comments), containing a name or description, sparse -i??? to find all the sparse profiles. This is a better experience than learning about the sparse profiles though word of mouth Durham: hg clone -i -> could discover and show a list to discover profiles. This could swith Greg's UX concern. Greg: how would you switch between profiles Augie: wants one way to narrow and widen the files and history Durham: primitives: repository + working copy, for these include/exclude What would be a better alternative: adding: adds to both, remove: only from sparse Durham proposal: Hg repository as a command: with include and exclude commands Adam: how would this work with share (Google disabled it), at FB people use it. At google share causes corruptions (narrow strips) Kyle: does not like narrow Andras: what about hg view We want a new command (bit of bikeshedding) name. As a start: we only grow Ryan: going from 0 rules to 1 rules (or the other way) is an issue: fiels disappear. Proposal: once repo tainted with sparse it stays that way. Augie: sparse config empty - no files; reset: delets the sparse config Additional problems with subincludes and hgignore Shell Prompts Session: FB has scm-prompt.sh in fb-hgext/scripts/ (demoed by Ryan) Commonly requested feature: status. Too slow to include though Possible solutions: async zsh prompts; watchman-push updated prompt simple dict file in .hg (could also be cronned or only updated when hg commands are run) Kevin likes the template language for "hgprompt" (author: sjl on bitbucket). Idea: have extension print out some shell code that could be evaled. Since it's owned by mercurial, it can do things the "best" way given the current mercurial release (eg, in a future with minimal startup time, could be an hg command, but for how could be something more similar to scm-prompt.sh) Gui for enterprise session: Google needs good UI for both Linux and Mac Needs to build it Requirements; ╔═════════════╦════════════╦═════════╦══════════════════════════════════╗ ║ Source tree ║ TortoiseHG ║ Nuclide ║ Need ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ yes ║ yes ║ Cross platform ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ yes ║ yes ║ Graph vizualization ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ yes ║ Amend / commits ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ yes ║ Annotate ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ ║ obslog ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ yes ║ wd state / diffs ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ ~ ║ interactive commit ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ ║ custom graph revsets ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ ║ templates ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ ║ custom commands ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ no ║ yes ║ yes ║ custom extension / configuration ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ yes ║ drag/drop rebase ║ ╠═════════════╬════════════╬═════════╬══════════════════════════════════╣ ║ ║ ║ yes ║ disabling emails / features ║ ╚═════════════╩════════════╩═════════╩══════════════════════════════════╝ A proper distribution for Tortoise would helps What about curses interfaces? https://bitbucket.org/aleufroy/lairucrem is one exemple => Try tortoise or write their own web-based interface Workflow Demos ============== Suggested Local Branching workflow @ Facebook --------------------------------------------- Setup: remotenames extension, almost all enabled Start: hg update master -B NAME (arc feature NAME) View: hg sl (smartlog) or hg ssl (super smartlog - hits code review server for extra status etc), or Interactive Smartlog GUI Push: 'hg push --to REMOTEBOOKMARK' (or click a button in the GUI) Suggested Local Branching workflow @ Google --------------------------------------------- hg citc newworkspace # create a new workspace which is a virtual filesystem Update a file get the directory added to the narrow spec hg uploadchain # upload the current stack and attach code review url to each commit tags don't get moved when evolving uploadchain moves the tags Want some notes on the obs-markers Submit is a click on the web-interface Suggested Local Branching workflow @ Unity --------------------------------------------- Use named branches Merge-based workflow Branches stay opened Suggested Local Branching workflow @ Mozilla --------------------------------------------- No standard workflow Half of developers use Git with git-cinnabar to contribute hg show work Not subcommands yet hg swork hg show stack Include the number of commits ahead and rebase command Workspaces ----------------- Bookmarks are used for two things: - Identify individual commits - Context switching, have a name to come back to a specific context Worksapce limit the scope of log and advanced log commands (hg sl, hg xl) Associate metadata to workspaces (like open tabs) Pending changes could be associated to a workspace Build artifacts could also be stored in workspaces Topic: -------- Hoisting topic idea comes form remotenames Ideas: * set names automatically when unspecified (something like a UUID) -- some people don't want to think about names, though they like to have things named * generate a random name (topic) upon creation of a new draft head => Unifying stack definition in core in a extensible API