Size: 3203
Comment: copy paste of sprint notes, foozy's email, etc.
|
Size: 8781
Comment: revise "Current status", according to detailed investigation
|
Deletions are marked like this. | Additions are marked like this. |
Line 9: | Line 9: |
'''Main proponents: YourNameHere''' | '''Main proponents: KatsunoriFujiwara, RodrigoDamazio''' |
Line 40: | Line 40: |
==== Summary of mode, relative-to, and recursion of each types ==== ||'''mode''' ||'''root-ed''' ||'''cwd-ed''' ||'''any-of-path''' ||'''control recursion by pattern''' ||'''context depend recursion''' || ||wildcard ||- ||`glob:` ||`relglob:` ||by ** ||o || ||regexp ||`re:` ||- ||`relre:` ||by $ ||x (*A) || ||raw string ||`path:` ||`relpath:` ||- ||(always) ||x || * (*A) "regexp" mode ignore pattern matches recursively (e.g. "`re:^foo$`" ignores file `foo/bar`). Detail is explained later. ==== The list of contexts, in which pattern is specified ==== ||'''pattern for''' ||'''default type''' ||'''recursion of wildcard''' ||'''related API''' || ||fileset ||`glob:` ||x ||ctx.match() || ||files() template function ||`glob:` ||x ||ctx.match() || ||diff() template function ||`glob:` ||o (*1) ||ctx.match() || ||file() revset predicate ||`glob:` ||x ||match.match() || ||follow() revset predicate ||`path:` ||x ||match.match() || ||--include/--exclude ||`glob:` ||o (*1) ||match.match() || ||hgignore ||`relre:` ||o (*1) ||match.match() || ||`archive` web command ||`path:` ||- (*2) ||scmutil.match() || ||`hg locate` ||`relglob:` ||x ||scmutil.match() || ||`hg log` ||`relpath:` ||x ||scmutil.matchandpats() || ||others (e.g. `hg files`) ||`relpath:` ||x ||scmutil.match() || * (*1) treated as `include`/`exclude` of match.match() (otherwise, treated as `pats` of match.match()) * (*2) no wildcard pattern matching occurs for `archive` web command, becuase `path:` is forcibly added to specified pattern in this case For "recursion of wildcard": * if "recursive of wildcard", pattern `glob:foo/bar` matches against file `foo/bar/baz`, for example * Inner context is used to decide "recursion of wildcard", if multiple contexts are combined For example, file `foo/bar/baz` is: * not matched at: `hg files glob:foo/bar` * not matched at: `hg files -I "set:'glob:foo/bar'"` * but matched at: `hg files -I glob:foo/bar` The last case seems to cause the issue mentioned by Rodrigo in "[[https://www.mercurial-scm.org/pipermail/mercurial-devel/2016-October/089003.html|match: adding non-recursive directory matching]]". And the second case can be used as instant work around for that issue. "Recursion of wildcard" of the pattern from a file follows one of what tries to read that file in. For example: * wildcard pattern read in by "`-I listfile:FILE`" matches recursively, but * one read in by "`hg status listfile:FILE`" doesn't ==== Reading patterns from file ==== ||'''read in by''' ||'''type substitution''' ||'''default type for hgignore''' ||'''default type for otherwise''' || ||include:FILE ||o ||`relre:` ||`relre:` || ||listfile:FILE ||x ||(*X) ||(*Y) || * (*X) this is prohibited by match.readpatternfile() * (*Y) decision about "default type" depends on the context, in which `listfile:FILE` is used (e.g. `relglob:` for "`hg locate`", but `relpath:` for "`hg files`"). If "type substitution", substitutions below occur always at reading patterns from file. This is mentioned in "`hg help patterns`" and "`hg help hgignore`", but type `relglob:` and `relre:` themselves aren't explained. * `glob:` => `relglob:` * `re:` => `relre:` Reading from `.hgignore` and "`[ui] ignore`" is treated as a variant of `include:` internally (e.g. `include:$REPOROOT/.hgignore`) ==== Recursion of ignore patterns ==== As a ignore pattern, "wildcard" and "raw string" modes are obviously recursive, because: * treating as same as "`--include PATTERN`" makes "wildcard" mode recursive * "raw string" mode is always recursive, regardless of context On the other hand, "regexp" mode itself is non-recursive. For example, with "`re:^foo$`" in .hgignore, "`hg debugignore`" shows the regexp, which doesn't match against file `foo/bar`. But actually, "`re:^foo$`" in .hgignore ignores file `foo/bar`, because dirstate (and "`hg debugignore`") examines whether specified file does: * match against specified ignore patterns, or * exist under the directory, which matches against specified ignore patterns and that file is ignored, if one of conditions above is true. Therefore, "regexp" ignore pattern is recursive, even if it uses "`$`". In conclusion, '''all ignore patterns are treated as recursive, regardless of pattern types'''. This special recursion of "regexp" mode is specific for ignore patterns. In other cases, "regexp" mode pattern isn't recursive, if it uses "`$`". === Proposal by foozy === ==== Control start point of matching arbitrarily ==== How about introducing new systematic names like below to re-organize current complicated mapping between names and matching ? |
|
Line 41: | Line 144: |
||wildcard ||--- ||glob ||relglob || ||regexp ||re ||--- ||relre || ||raw string ||path ||relpath ||--- || If rule is read in from file (e.g. .hgignore): * "glob" is treated as "relglob" * "re" is treated as "relre" This is mentioned in "hg help patterns" and "hg help hgignore", but syntax name "relglob" and "relre" themselves aren't explained. "end of name" matching is required: * for glob/relglob as PATTERN (e.g. argument in command line), but * not for glob/relglob as INCLUDES/EXCLUDES, or other pattern syntaxes For example, file "foo/bar/baz" is: * not matched at "hg files glob:foo/bar" * but matched at "hg file -I glob:foo/bar" This isn't mentioned in any help document :-<, and the latter seems to cause the issue mentioned in this patch series. === Proposal by foozy === How about introducing new systematic names like below to re-organize current complicated mapping between names and matching ? (and enable "end of name" matching by "-eon" suffix or so) ||'''pattern type''' ||'''root-ed''' ||'''cwd-ed''' ||'''any-of-path''' || ||wildcard ||rootglob ||cwdglob ||anyglob || ||regexp ||rootre ||cwdre ||anyre || ||raw string ||rootpath ||cwdpath ||anypath || Of course, we should take care of backward compatibility of .hgignore or so (e.g. config knob to warn/abort for new syntax name in .hgignore). |
||wildcard ||`rootglob:` ||`cwdglob:` ||`anyglob:` || ||regexp ||`rootre:` ||`cwdre:` ||`anyre:` || ||raw string ||`rootpath:` ||`cwdpath:` ||`anypath:` || * new "glob"/"re" families match recursively, fully according to the specified pattern * each of existing pattern types will be internally treated as an alias of types above * recursion of "glob"/"relglob" aliases is treated specially, for backward compatibility With these newly introduced pattern types, both "start point" and "recursion" of matching can be fully controlled arbitrarily via existing command I/F (as PATTERN, or via -I/-X). ==== Control recursion of matching arbitrarily ==== With current Mercurial (at least, 4.0 or earlier), recursion of each pattern types can be controlled by: ||'''type''' ||'''for recursive matching''' ||'''for non-recursive matching''' || ||glob ||using "**" || using "*" || ||re ||omitting "$" || appending "$" || ||path ||always || --- || User can't control recursion of matching with "path" type pattern arbitrarily (it matches against both directory and file). Therefore, how about introducing two more additional pattern types "file" and "dir" ? ||'''type''' ||'''for recursive''' ||'''for non-recursive''' || ||file ||--- ||always || ||dir ||always(*) ||--- || (*) "dir" matches against only directory. After adding these types, there are 5 (base types) x 3 (start points) = 15 types ||'''base type''' ||'''root-ed''' ||'''cwd-ed''' ||'''any-of-path''' || ||wildcard ||rootglob ||cwdglob ||anyglob || ||regexp ||rootre ||cwdre ||anyre || ||raw path ||rootpath ||cwdpath ||anypath || ||raw file name ||rootfile ||cwdfile ||anyfile || ||raw dir name ||rootdir ||cwddir ||anydir || |
Note:
This page is primarily intended for developers of Mercurial.
Better Matcher API and File Patterns Plan
Status: Project
Main proponents: KatsunoriFujiwara, RodrigoDamazio
This is a speculative project and does not represent any firm decisions on future behavior.
Add a short summary of the idea here.
1. Goal
- Short term: add non-recursive globs ?
- Long term: extensible matcher API ?
2. Detailed description
2.1. Sprint Notes
Non-recursive globs (Rodrigo, spectral, Durham, : Issue is that * is sometimes recursive matcher API is a mess Should we re-write match.py or just add fileglob? Suggestion: add fileglob via a new, cleaner API, then migrate others over time Possible FB use case: pick parts of a tree to include and exclude (would add ordering dependency instead of excludes always trumping includes?) matcher API should be extensible matcher composition: anyof, allof, negate, per-file-type, etc. Inconsistencies in pattern behavior between hgignore, --include/--exclude, etc. FB: conversion between matchers and watchman expressions Proposal: wiki page, first group to have a use case proposes the initial API
2.2. Current Status
2.2.1. Summary of mode, relative-to, and recursion of each types
mode |
root-ed |
cwd-ed |
any-of-path |
control recursion by pattern |
context depend recursion |
wildcard |
- |
glob: |
relglob: |
by ** |
o |
regexp |
re: |
- |
relre: |
by $ |
x (*A) |
raw string |
path: |
relpath: |
- |
(always) |
x |
(*A) "regexp" mode ignore pattern matches recursively (e.g. "re:^foo$" ignores file foo/bar). Detail is explained later.
2.2.2. The list of contexts, in which pattern is specified
pattern for |
default type |
recursion of wildcard |
related API |
fileset |
glob: |
x |
ctx.match() |
files() template function |
glob: |
x |
ctx.match() |
diff() template function |
glob: |
o (*1) |
ctx.match() |
file() revset predicate |
glob: |
x |
match.match() |
follow() revset predicate |
path: |
x |
match.match() |
--include/--exclude |
glob: |
o (*1) |
match.match() |
hgignore |
relre: |
o (*1) |
match.match() |
archive web command |
path: |
- (*2) |
scmutil.match() |
hg locate |
relglob: |
x |
scmutil.match() |
hg log |
relpath: |
x |
scmutil.matchandpats() |
others (e.g. hg files) |
relpath: |
x |
scmutil.match() |
(*1) treated as include/exclude of match.match() (otherwise, treated as pats of match.match())
(*2) no wildcard pattern matching occurs for archive web command, becuase path: is forcibly added to specified pattern in this case
For "recursion of wildcard":
if "recursive of wildcard", pattern glob:foo/bar matches against file foo/bar/baz, for example
- Inner context is used to decide "recursion of wildcard", if multiple contexts are combined
For example, file foo/bar/baz is:
not matched at: hg files glob:foo/bar
not matched at: hg files -I "set:'glob:foo/bar'"
but matched at: hg files -I glob:foo/bar
The last case seems to cause the issue mentioned by Rodrigo in "match: adding non-recursive directory matching". And the second case can be used as instant work around for that issue.
"Recursion of wildcard" of the pattern from a file follows one of what tries to read that file in. For example:
wildcard pattern read in by "-I listfile:FILE" matches recursively, but
one read in by "hg status listfile:FILE" doesn't
2.2.3. Reading patterns from file
read in by |
type substitution |
default type for hgignore |
default type for otherwise |
include:FILE |
o |
relre: |
relre: |
listfile:FILE |
x |
(*X) |
(*Y) |
- (*X) this is prohibited by match.readpatternfile()
(*Y) decision about "default type" depends on the context, in which listfile:FILE is used (e.g. relglob: for "hg locate", but relpath: for "hg files").
If "type substitution", substitutions below occur always at reading patterns from file. This is mentioned in "hg help patterns" and "hg help hgignore", but type relglob: and relre: themselves aren't explained.
glob: => relglob:
re: => relre:
Reading from .hgignore and "[ui] ignore" is treated as a variant of include: internally (e.g. include:$REPOROOT/.hgignore)
2.2.4. Recursion of ignore patterns
As a ignore pattern, "wildcard" and "raw string" modes are obviously recursive, because:
treating as same as "--include PATTERN" makes "wildcard" mode recursive
- "raw string" mode is always recursive, regardless of context
On the other hand, "regexp" mode itself is non-recursive. For example, with "re:^foo$" in .hgignore, "hg debugignore" shows the regexp, which doesn't match against file foo/bar.
But actually, "re:^foo$" in .hgignore ignores file foo/bar, because dirstate (and "hg debugignore") examines whether specified file does:
- match against specified ignore patterns, or
- exist under the directory, which matches against specified ignore patterns
and that file is ignored, if one of conditions above is true.
Therefore, "regexp" ignore pattern is recursive, even if it uses "$".
In conclusion, all ignore patterns are treated as recursive, regardless of pattern types.
This special recursion of "regexp" mode is specific for ignore patterns. In other cases, "regexp" mode pattern isn't recursive, if it uses "$".
2.3. Proposal by foozy
2.3.1. Control start point of matching arbitrarily
How about introducing new systematic names like below to re-organize current complicated mapping between names and matching ?
pattern type |
root-ed |
cwd-ed |
any-of-path |
wildcard |
rootglob: |
cwdglob: |
anyglob: |
regexp |
rootre: |
cwdre: |
anyre: |
raw string |
rootpath: |
cwdpath: |
anypath: |
- new "glob"/"re" families match recursively, fully according to the specified pattern
- each of existing pattern types will be internally treated as an alias of types above
- recursion of "glob"/"relglob" aliases is treated specially, for backward compatibility
With these newly introduced pattern types, both "start point" and "recursion" of matching can be fully controlled arbitrarily via existing command I/F (as PATTERN, or via -I/-X).
2.3.2. Control recursion of matching arbitrarily
With current Mercurial (at least, 4.0 or earlier), recursion of each pattern types can be controlled by:
type |
for recursive matching |
for non-recursive matching |
glob |
using "**" |
using "*" |
re |
omitting "$" |
appending "$" |
path |
always |
--- |
User can't control recursion of matching with "path" type pattern arbitrarily (it matches against both directory and file).
Therefore, how about introducing two more additional pattern types "file" and "dir" ?
type |
for recursive |
for non-recursive |
file |
--- |
always |
dir |
always(*) |
--- |
(*) "dir" matches against only directory.
After adding these types, there are 5 (base types) x 3 (start points) = 15 types
base type |
root-ed |
cwd-ed |
any-of-path |
wildcard |
rootglob |
cwdglob |
anyglob |
regexp |
rootre |
cwdre |
anyre |
raw path |
rootpath |
cwdpath |
anypath |
raw file name |
rootfile |
cwdfile |
anyfile |
raw dir name |
rootdir |
cwddir |
anydir |
2.4. Proposal by Rodrigo
Add rootglob: to get over the issue of -I/-X patterns.
https://patchwork.mercurial-scm.org/patch/17311/
3. Roadmap