Note:

This page is primarily intended for developers of Mercurial.

Better Matcher API and File Patterns Plan

Status: Project

Main proponents: KatsunoriFujiwara, RodrigoDamazio

/!\ This is a speculative project and does not represent any firm decisions on future behavior.

{X} Add a short summary of the idea here.

1. Goal

2. Detailed description

2.1. Sprint Notes

Non-recursive globs (Rodrigo, spectral, Durham, :
    Issue is that * is sometimes recursive
    matcher API is a mess
    Should we re-write match.py or just add fileglob?
    Suggestion: add fileglob via a new, cleaner API, then migrate others over time
    Possible FB use case: pick parts of a tree to include and exclude (would add ordering dependency instead of excludes always trumping includes?)
    matcher API should be extensible
    matcher composition: anyof, allof, negate, per-file-type, etc.
    Inconsistencies in pattern behavior between hgignore, --include/--exclude, etc.
    FB: conversion between matchers and watchman expressions
    Proposal: wiki page, first group to have a use case proposes the initial API

2.2. Current Status

2.2.1. Summary of mode, relative-to, and recursion of each types

mode

root-ed

cwd-ed

any-of-path

control recursion by pattern

context depend recursion

wildcard

-

glob:

relglob:

by **

o

regexp

re:

-

relre:

by $

x (*A)

raw string

path:

relpath:

-

(always)

x

2.2.2. The list of contexts, in which pattern is specified

pattern for

default type

recursion of wildcard

related API

fileset

glob:

x

ctx.match()

files() template function

glob:

x

ctx.match()

diff() template function

glob:

o (*1)

ctx.match()

file() revset predicate

glob:

x

match.match()

follow() revset predicate

path:

x

match.match()

--include/--exclude

glob:

o (*1)

match.match()

hgignore

relre:

o (*1)

match.match()

archive web command

path:

- (*2)

scmutil.match()

hg locate

relglob:

x

scmutil.match()

hg log

relpath:

x

scmutil.matchandpats()

others (e.g. hg files)

relpath:

x

scmutil.match()

For "recursion of wildcard":

For example, file foo/bar/baz is:

The last case seems to cause the issue mentioned by Rodrigo in "match: adding non-recursive directory matching". And the second case can be used as instant work around for that issue.

Table below re-summarizes about recursion (= matching against intermediate directory) of each modes.

mode

pattern

-I/-X

in "set:"

-I/-X with "set:"

wildcard

with **

o

o

o

without **

o

x

x

regexp

with $

x

x

x

without $

o

o

o

raw string

(always)

o

o

o

"Recursion of wildcard" of the pattern from a file follows one of what tries to read that file in. For example:

2.2.3. Reading patterns from file

read in by

type substitution

default type for hgignore

default type for otherwise

include:FILE

o

relre:

relre:

listfile:FILE

x

(*X)

(*Y)

If "type substitution", substitutions below occur always at reading patterns from file. This is mentioned in "hg help patterns" and "hg help hgignore", but type relglob: and relre: themselves aren't explained.

Reading from .hgignore and "[ui] ignore" is treated as a variant of include: internally (e.g. include:$REPOROOT/.hgignore)

2.2.4. Recursion of ignore patterns

As a ignore pattern, "wildcard" and "raw string" modes are obviously recursive, because:

On the other hand, "regexp" mode itself is non-recursive. For example, with "re:^foo$" in .hgignore, "hg debugignore" shows the regexp, which doesn't match against file foo/bar.

But actually, "re:^foo$" in .hgignore ignores file foo/bar, because dirstate (and "hg debugignore") examines whether specified file does:

and that file is ignored, if one of conditions above is true.

Therefore, "regexp" ignore pattern is recursive, even if it uses "$".

In conclusion, all ignore patterns are treated as recursive, regardless of pattern types.

This special recursion of "regexp" mode is specific for ignore patterns. In other cases, "regexp" mode pattern isn't recursive, if it uses "$".

2.3. Proposal by foozy

2.3.1. Control start point of matching arbitrarily

How about introducing new systematic names like below to re-organize current complicated mapping between names and matching ?

pattern type

root-ed

cwd-ed

any-of-path

wildcard

rootglob:

cwdglob:

anyglob:

regexp

rootre:

cwdre:

anyre:

raw string

rootpath:

cwdpath:

anypath:

With these newly introduced pattern types, both "start point" and "recursion" of matching can be fully controlled arbitrarily via existing command I/F (as PATTERN, or via -I/-X).

2.3.2. Control recursion of matching arbitrarily

With current Mercurial (at least, 4.0 or earlier), recursion of each pattern types can be controlled by:

type

for recursive matching

for non-recursive matching

glob

using "**"

using "*"

re

omitting "$"

appending "$"

path

always

---

User can't control recursion of matching with "path" type pattern arbitrarily (it matches against both directory and file).

Therefore, how about introducing two more additional pattern types "file" and "dir" ?

type

for recursive

for non-recursive

file

---

always

dir

always(*)

---

(*) "dir" matches against only directory.

After adding these types, there are 5 (base types) x 3 (start points) = 15 types

base type

root-ed

cwd-ed

any-of-path

wildcard

rootglob

cwdglob

anyglob

regexp

rootre

cwdre

anyre

raw path

rootpath

cwdpath

anypath

raw file name

rootfile

cwdfile

anyfile

raw dir name

rootdir

cwddir

anydir

2.4. Proposal by Rodrigo

Add rootglob: to get over the issue of -I/-X patterns.

https://patchwork.mercurial-scm.org/patch/17311/

3. Roadmap

{X}

4. See Also


CategoryDeveloper CategoryNewFeatures