Obtaining aggregate information of hashed directory
It would be handy to obtain an aggregate of the information that dirhash used to compute the final hash for the root directory. For example, in the form of a ordered dictionary data structure that could be pretty printed to a yaml or json file. These printed files could be easily diffable, enabling use cases for logging or highlighting file tree or content changes to end users.
I see from the scantree examples, the .apply() function can be used for such recursive transforms:
hello_count_tree = tree.apply(
file_apply=lambda path: {
'name': path.name,
'count': sum([
w.lower() == 'hello'
for w in path.as_pathlib().read_text().split()
])
},
dir_apply=lambda dir_: {
'name': dir_.path.name,
'count': sum(e['count'] for e in dir_.entries),
'sub_counts': [e for e in dir_.entries]
},
)
from pprint import pprint
pprint(hello_count_tree)
{'count': 3,
'name': 'dir',
'sub_counts': [{'count': 2, 'name': 'file1.txt'},
{'count': 1,
'name': 'd1',
'sub_counts': [{'count': 1, 'name': 'file2.txt'},
{'count': 0,
'name': 'd2',
'sub_counts': [{'count': 0,
'name': 'file3.txt'}]}]}]}
However, the root_node internally computed for this is not easily accessed without reimementing much of the library internals.
https://github.com/andhus/dirhash-python/blob/37c89746520a8fa9961f657e40bfc7bd77c6be25/src/dirhash/init.py#L269
https://github.com/andhus/dirhash-python/blob/37c89746520a8fa9961f657e40bfc7bd77c6be25/src/dirhash/init.py#L296
I'm also unsure yet how to leverage scantree with the RecursionPath class to render/print this aggregate data structure.
https://github.com/andhus/scantree/blob/25b51ca9be973389d671565de20cbb021871521d/src/scantree/_path.py#L16
Related: https://github.com/colcon/colcon-package-selection/pull/44