Size: 2434
Comment:
|
Size: 2438
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 17: | Line 17: |
We can improve '''b''', '''c.i''' and '''c.ii'''. '''a''' is not improvable easily. | We can improve '''3.2, 3.3.1, and 3.3.2'''. |
Line 19: | Line 19: |
'''c.i''' | '''3.1''' is not improvable easily. '''3.3.1''' |
Line 24: | Line 26: |
* We can sort the dirstate to answer the '''c.i '''question without '''b''', need data to know if it is worth it | * We can sort the dirstate to answer the '''3.3.1 '''question without '''3.2''', need data to know if it is worth it |
Line 26: | Line 28: |
'''c.ii''' | '''3.3.2''' |
Line 30: | Line 32: |
'''b''' | '''3.2''' |
Line 45: | Line 47: |
Note:
This page is primarily intended for developers of Mercurial.
Dirstate Format Improvements Plan
1. Current state:
- The dirstate is stored in a file and reflects the state of all the files in the repository tracked by hg
- The dirstate is written in a random order (Python dict iteration)
- Status spends most of its time reading / parsing / using the dirstate and status is used in many highly used commands, here it what it does:
- Get the list of files that changed recently from hg watchman
- Read the entire dirstate, store it in memory
- Iterate through the dirstate and check the status of the file returned from watchman, we are interested in two questions:
- is the file in the dirstate?
- is the file modified/added/removed?
- For a dirstate of about ~100Mb, it takes 5s to build and write it and 350ms to read it
2. Improvement plan:
We can improve 3.2, 3.3.1, and 3.3.2.
3.1 is not improvable easily.
3.3.1
- Build a bloom filter with all the filenames, stored it at the beginning of the dirstate file, use it to check if files are in dirstate.
The bloom filter takes about 5s to be built on fbsource with 0.001% precision (building the dirstate takes about 5s as well). We can rebuild the bloom filter semi-regularly (every day/week?) to ensure that we don't have too many false positives stemming from files getting untracked.
This requires a format change in dirstate => need discussion with other people who want dirstate format change to bundle all changes at once
We can sort the dirstate to answer the 3.3.1 question without 3.2, need data to know if it is worth it
3.3.2
- We can store the modified/added/remove files entries on top of the dirstate to shortcircuit the iteration over all the entries. Diff is out for review. Improvement seems to yield to 50% hg status time on fbsource
3.2
- If we improve all the the changes described above we no longer need to read the entire dirstate
3. New on-disk format
The new dirstate format will look like the previous format with the addition of:
- Version of the format (as we will have more than one type of dirstate format)
- Awareness of directories / tree-structured / stem compression
- Checksums for files in lookup state so we don't have to visit revlogs
- Sorted order