dirhash-python icon indicating copy to clipboard operation
dirhash-python copied to clipboard

Obtaining aggregate information of hashed directory

Open ruffsl opened this issue 4 years ago • 1 comments

It would be handy to obtain an aggregate of the information that dirhash used to compute the final hash for the root directory. For example, in the form of a ordered dictionary data structure that could be pretty printed to a yaml or json file. These printed files could be easily diffable, enabling use cases for logging or highlighting file tree or content changes to end users.

I see from the scantree examples, the .apply() function can be used for such recursive transforms:

hello_count_tree =  tree.apply(
    file_apply=lambda path: {
        'name': path.name,
        'count': sum([
            w.lower() == 'hello'
            for w in path.as_pathlib().read_text().split()
        ])
    },
    dir_apply=lambda dir_: {
        'name': dir_.path.name,
        'count': sum(e['count'] for e in dir_.entries),
        'sub_counts': [e for e in dir_.entries]
    },
)
from pprint import pprint
pprint(hello_count_tree)
{'count': 3,
 'name': 'dir',
 'sub_counts': [{'count': 2, 'name': 'file1.txt'},
                {'count': 1,
                 'name': 'd1',
                 'sub_counts': [{'count': 1, 'name': 'file2.txt'},
                                {'count': 0,
                                 'name': 'd2',
                                 'sub_counts': [{'count': 0,
                                                 'name': 'file3.txt'}]}]}]}

However, the root_node internally computed for this is not easily accessed without reimementing much of the library internals.

https://github.com/andhus/dirhash-python/blob/37c89746520a8fa9961f657e40bfc7bd77c6be25/src/dirhash/init.py#L269

https://github.com/andhus/dirhash-python/blob/37c89746520a8fa9961f657e40bfc7bd77c6be25/src/dirhash/init.py#L296

I'm also unsure yet how to leverage scantree with the RecursionPath class to render/print this aggregate data structure.

https://github.com/andhus/scantree/blob/25b51ca9be973389d671565de20cbb021871521d/src/scantree/_path.py#L16

Related: https://github.com/colcon/colcon-package-selection/pull/44

ruffsl avatar Apr 11 '21 18:04 ruffsl