agentscope fix(evaluator_storage): correct docstring on directory organization

AgentScope Version

commit: 5c3a7705c3a922e8a41c88f91666c870737f9075

I am updating the latest code in main branch.

Description

fix(evaluator_storage): correct save path ordering in FileEvaluatorStorage

In docstring, the directory structure is:

The files are organized in a directory structure:
    - save_dir/
        - evaluation_result.json
        - evaluation_meta.json
        - {task_id}/
            - {repeat_id}/
                - solution.json
                - evaluation/
                    - {metric_name}.json

But the implementation doesn't follow this structure.

Checklist

Please check the following items before code is ready to be reviewed.

[x] Code has been formatted with pre-commit run --all-files command
[x] All tests are passing
[x] Docstrings are in Google style
[x] Related documentation has been updated (e.g. links, examples, etc.)
[x] Code is ready for review

Nov 12 '25 10:11 kristol07

@qbc2016 @DavdGao Please take a review, this is minor change.

Nov 12 '25 10:11 kristol07

@kristol07 Thanks for pointing out the issue, but the it seems like it's the typo in docstrings rather than the code implementation. Considering we are developing evaluation visualization in agentscope-studio with the current directory organization, maybe just fix the wrong description in docstrings instead?

Nov 17 '25 04:11 DavdGao

@DavdGao I think the best approach depends on how you want to interpret or evaluate the results. In my situation, since there are multiple distinct testing scenarios and I want to assess my agent's stability in each one, I’m more interested in the outcomes of each repeated task within the same scenario. Therefore, grouping the results by task ID is preferable in my case, that's why I thought it's code error. On the other hand, if all the testing scenarios are of the same type, it makes more sense to group by repeat ID and review the overall results across all test scenarios, that may be the case of agentscope-studio.

Grouped by task (test case):

Grouped by repeatId:

Nov 17 '25 04:11 kristol07

@DavdGao I think the best approach depends on how you want to interpret or evaluate the results. In my situation, since there are multiple distinct testing scenarios and I want to assess my agent's stability in each one, I’m more interested in the outcomes of each repeated task within the same scenario. Therefore, grouping the results by task ID is preferable in my case, that's why I thought it's code error. On the other hand, if all the testing scenarios are of the same type, it makes more sense to group by repeat ID and review the overall results across all test scenarios, that may be the case of agentscope-studio.

Grouped by task (test case):

Grouped by repeatId:

Hi @DavdGao Do you have any suggestion on the flexibility to be provided to developers? For your comment, pr is updated already.

Nov 20 '25 02:11 kristol07

All committers have signed the CLA.

Dec 02 '25 09:12 cla-assistant[bot]